Clarify what sort of biomedical data ARAs will be required to obtain … #5

dkoslicki · 2020-04-24T16:01:10Z

…from KPs or other translator resources.

In particular, while obtaining Message nodes and edges from Translator resources makes complete sense imho, ARAs frequently require additional information to perform reasoning. Examples include:

General use case: utilizing KS (and perhaps KP) data dumps to speed performance.
- Specific use case: NCBIeUtils is a slow API endpoint. When an ARA receives/creates a Message KG with many thousands of nodes and edges, the ARA may wish to annotate all pairs of nodes or all edges with co-occurrence frequency in PubMed literature. A locally cached version of PubMed/baseline stored in a local database causes these (tens of) thousands of queries to be executed in a performant fashion. Even though technically a Translator KP could be used to provide a faster API endpoint than NCBIeUtils for this purpose, the performance penalty would still be experienced. i.e. keep these latency numbers in mind.
General use case: requiring locally aggregated graphs or other non-Translator resources for machine learning/reasoning purposes.
- Specific use case: An ARA may need access to a specifically formatted graph (not associated with any particular Message) in order to utilize graph convolutional neural network methods (or node embedding methods for downstream traditional ML techniques). The ARA would store and use this information for link prediction, answer scoring, etc. purposes. It seems unreasonable to require a KP to provide this resource for a particular ARA.

…from KPs or other translator resources.

cbizon · 2020-04-26T21:22:32Z

@dkoslicki - these are great points, and I'd like to ask a few questions to help clarify my understanding.

So for use case 1, is it fair to say that this is mostly a concern about caching? If so, then I think it's mostly orthogonal to whether ARAs can get non-KP information. So, if caching of KPs were allowed, then I think we would want to make a KP for that information, such that any of the ARAs could use it. I guess one problem could be if the data being pulled is not amenable to being served via a ReasonerAPI, but I'm not sure if that's the problem here...

For use case 2, I agree that we need to be able to instantiate those aggregate graphs for all kinds of big-graph analysis. The question in my mind is whether they can be created from KPs. My hope is that they could be - the KGX tools developed by @cmungall and @deepakunni3 among others should allow this to happen pretty easily? Or do you see a need to bring in non-KP information into such a graph?

dkoslicki · 2020-04-27T23:38:00Z

@cbizon

For use case 1, yes this is “mostly” a concern about caching. Hitting an API is significantly slower than hitting a database stored in memory. While KP’s could provide endpoints that allow bulk download of the data:

You point of ReasonerAPI perhaps not being amenable to the task is well taken (and apparently proposed on the agenda for the current “relay” breakout groups on Thursday).
Some data sources are not provided by any KP (eg. We (Team Expander Agent) have used the Veteran’s Association National Drug File as a cached sources of information previously). Seems onerous to ask a KP team to roll out such a data source when we’re using it for a specific task (eg. use case 2).

For use case 2: I think the aggregated graphs (from tools such as KGX) would be a great starting point, but additional (non-KP) information might be required. For example, some GNN’s require graphs to have a specific format (easily done by post-processing a, say, KGX aggregated graph), but also require additional information decorating node/edge properties (specifically, numerical values related to non-KP derived (or only partially KP derived) training data, etc). It would be nice to clarify if it’s “ok” for ARA’s to keep/control these modified graphs (originally derived from some Translator KP-aggregated graph(s)) as local sources of information to informal machine learning models.

cbizon · 2020-05-05T12:51:57Z

@dkoslicki - did the Relay discussion help with the clarification here? I think that the outcome of that discussion was that:

caching is fine, as long as the things being cached were from KPs
If an ARA needs non KP-served data it should try to get a KP team to stand them up, but at least in the short term, standing up their own KP is ok.

dkoslicki · 2020-05-05T16:11:05Z

@dkoslicki - did the Relay discussion help with the clarification here? I think that the outcome of that discussion was that:

caching is fine, as long as the things being cached were from KPs

If an ARA needs non KP-served data it should try to get a KP team to stand them up, but at least in the short term, standing up their own KP is ok.

Yup! I think that accurately summarizes the discussion. Though on point 2, "in the short term" may end up being longer than we think (as KP teams have their own milestones to prioritize). Might be wise to make the architecture doc just not say anything/much on this point (as I think is currently the case) until a KP/KS registry is settled on, a request/ticketing system is set up, etc.

jzollars · 2020-06-10T17:24:21Z

@NCATSTranslator/architecture-core: Test: Review Pull Request.

Rosinaweber

About No. 7:
[[ARAs obtain Message nodes and edges only via KPs (or other ARAs), not from locally-cached aggregated graphs or non-Translator data sources.]]
Sounds odd that ARAs can obtain Message nodes and edges from other ARAs because to provide those, would it be necessary that ARAs locally cache graphs? Apologies if I'm missing something.

Clarify what sort of biomedical data ARAs will be required to obtain …

b5fc3de

…from KPs or other translator resources.

jzollars requested a review from a team June 10, 2020 19:44

Rosinaweber reviewed Oct 1, 2020

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarify what sort of biomedical data ARAs will be required to obtain … #5

Clarify what sort of biomedical data ARAs will be required to obtain … #5

dkoslicki commented Apr 24, 2020 •

edited

Loading

cbizon commented Apr 26, 2020

dkoslicki commented Apr 27, 2020

cbizon commented May 5, 2020

dkoslicki commented May 5, 2020

jzollars commented Jun 10, 2020

Rosinaweber left a comment

Clarify what sort of biomedical data ARAs will be required to obtain … #5

Are you sure you want to change the base?

Clarify what sort of biomedical data ARAs will be required to obtain … #5

Conversation

dkoslicki commented Apr 24, 2020 • edited Loading

cbizon commented Apr 26, 2020

dkoslicki commented Apr 27, 2020

cbizon commented May 5, 2020

dkoslicki commented May 5, 2020

jzollars commented Jun 10, 2020

Rosinaweber left a comment

Choose a reason for hiding this comment

dkoslicki commented Apr 24, 2020 •

edited

Loading