-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add tool for extracting annotions of species in an ODE model #81
feat: Add tool for extracting annotions of species in an ODE model #81
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work 💪. I am very pleased to see the thought that went into fetching additional information from different ontologies and databases. @awmulyadi I think the APIs could be useful for you for the enrichment part. At least chemicals, proteins, genes, GO, and diseases appear to be covered as many are included in OLS.
I have several comments in regard to the testing and how the multiple OLS sub databases are handled. Please reach out if you have any questions.
current_state = app.get_state(config) | ||
dic_annotations_data = current_state.values["dic_annotations_data"] | ||
print (dic_annotations_data) | ||
assert isinstance(dic_annotations_data, list) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good idea to test multiple models. However, simple testing that some data was created is not so rigorous. I would recommend the following on top of what you already have testing that the state data was created properly:
-
Prior to this test, a simple test to ensure that the outputs of
prepare_content_msg
are as expected. -
I would use the expected string from
prepare_content_msg
for each of the different models for all species as the test case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the function of test_all_species which covers all the the expected outputs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool. However, I am still missing the test for prepare_content_msg
and the comparison of the expected string produced by this method for all species (unless I somehow missed it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reversed_messages = current_state.values["messages"][::-1]
# Covered all the use case for the expecetd sting on all the species
test_condition = False
for msg in reversed_messages:
if isinstance(msg, ToolMessage) and msg.name == "get_annotation":
print("ToolMessage Content:", msg.content) # Debugging output
if msg.artifact is None and ('ORI' in msg.content or
"Successfully extracted annotations for the species"
in msg.content or "Error" in msg.content):
test_condition = True
break
dic_annotations_data = current_state.values["dic_annotations_data"]
assert isinstance(dic_annotations_data, list),\
f"Expected a list for model {model_id}, got {type(dic_annotations_data)}"
assert len(dic_annotations_data) > 0,\
f"Expected species data for model {model_id}, but got empty list"
assert test_condition # Expected output is validated
Rather than using prepare_content_msg, I have added this code where it checks the expected and the tool message produced.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see.
I think this can be made much clearer with my suggestions above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have created new function test_prepare_content_msg() for checking expected messages
""" | ||
Process link to format it correctly. | ||
""" | ||
substrings = ["chebi/", "pato/", "pr/", "fma/", "sbo/"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a bit concerned this will be difficult to maintain as there are a LOT of different ontologies. What is the problem that this method solves in regard to link formatting? If it is needed, is it possible to make it more general? Another idea would be just to include all of the ontology abbreviations from OLS if it is the same for each of them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Problem faced is that, in Some cases the link don't work when get_miriam_annotation is called because they have database name in between the Link .
For example in model 537 for species sR
sR http://identifiers.org/pato/PATO:0001537
As the returned link is invalid, to make it a valid link i have use these substrings to remove unnecessary part in link.
yes we can include all of ontology terms, May be in next release as i don't know all the ontology abbreviations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. I remember running into this issue as well once...
All of the OLS abbreviations can be found here: https://www.ebi.ac.uk/ols4/ontologies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the link
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The review updates are looking much better 👍. Just a few minor comments at this point.
term = "GO:ABC123" | ||
label = fetch_from_ols(term) | ||
assert label.startswith("Error: 404") | ||
term_1 = "GO:0005886" #Negative result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
term_1 = "GO:0005886" #Negative result | |
term_1 = "GO:0005886" #Positive result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have corrected the comment
label = fetch_from_ols(term) | ||
assert label.startswith("Error: 404") | ||
term_1 = "GO:0005886" #Negative result | ||
term_2 = "GO:ABC123" #Positive result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
term_2 = "GO:ABC123" #Positive result | |
term_2 = "GO:ABC123" #Negative result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have corrected the comment
@@ -266,10 +261,13 @@ def _fetch_descriptions(self, data: List[dict[str, str]]) -> dict[str, str]: | |||
|
|||
# In the following loop, we fetch the descriptions for the identifiers | |||
# based on the database type. | |||
# Constants | |||
ols_ontology_abbreviations = {'pato', 'chebi', 'sbo', 'fma', 'pr'} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Much cleaner with a named variable for the abbreviations 🙂. It looks like there are two places where this list is used. Would it be possible to make this a const global variable at the top of the file or in the init so that there is no replication?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have made it as a constant global variable
|
||
reversed_messages = current_state.values["messages"][::-1] | ||
|
||
# Covered all the use case for the expecetd sting on all the species |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Covered all the use case for the expecetd sting on all the species | |
# Covers all of the use cases for the expected string on all the species |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have updated the comment
Hi, @dmccloskey making some more updates I will let you know, Could you please merge the pull request after making some more changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The updates look good 👍. Nice work 💪.
Before we can merge, please take care of the following:
- the merge conflicts
- open an issue for adding all of the OLS abbreviations
Hi @dmccloskey, I am still working on this PR on some updates , Please review the code only after you have heard from me. Thx |
Hi @dmccloskey, Thank you for your feedback. I’ve addressed the items you mentioned and would like to provide a detailed update on the recent changes:
In get_annotaion I have added below code : In test_get_annotaion I have updated as follows :
In addition, I will open a separate issue to track the addition of all missing OLS abbreviations to ensure comprehensive coverage across models. Additional Verifications: Please let me know if you have any further questions or require additional changes :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please integrate my suggestions (you may have to modify slightly), check that the tests and linting pass, and then merge 🙂.
@@ -229,7 +234,7 @@ def _process_link(self, link: str) -> str: | |||
""" | |||
Process link to format it correctly. | |||
""" | |||
substrings = ["chebi/", "pato/", "pr/", "fma/", "sbo/"] | |||
substrings = ["chebi/", "pato/", "pr/", "fma/", "sbo/", "go/"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
substrings = ["chebi/", "pato/", "pr/", "fma/", "sbo/", "go/"] |
@@ -229,7 +234,7 @@ def _process_link(self, link: str) -> str: | |||
""" | |||
Process link to format it correctly. | |||
""" | |||
substrings = ["chebi/", "pato/", "pr/", "fma/", "sbo/"] | |||
substrings = ["chebi/", "pato/", "pr/", "fma/", "sbo/", "go/"] | |||
for substring in substrings: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for substring in substrings: | |
for substring in ols_ontology_abbreviations: |
@@ -229,7 +234,7 @@ def _process_link(self, link: str) -> str: | |||
""" | |||
Process link to format it correctly. | |||
""" | |||
substrings = ["chebi/", "pato/", "pr/", "fma/", "sbo/"] | |||
substrings = ["chebi/", "pato/", "pr/", "fma/", "sbo/", "go/"] | |||
for substring in substrings: | |||
if substring in link: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if substring in link: | |
if substring + '/' in link: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @dmccloskey ,
I’ve made the suggested updates and verified that both pytest and linting have passed successfully.
Please let me know if there’s anything else you'd like me to address.
@Rakesh-Seenu It looks like there are a few linting failures and some failing tests. Please take a look and ping me when they have been resolved. |
🎉 This PR is included in version 1.14.0 🎉 The release is available on GitHub release Your semantic-release bot 📦🚀 |
For authors
Video3.mp4
Description
This PR introduces a new tool in Talk2BioModels that helps retrieve species annotations from a model.
When the tool is prompted tool uses get_miriam_annotation function from basico library in the backend
to retrieve the annotations of the specified species.
When this function is called, it returns the following details:
However, the species name is often not very informative, Hence I have developed modules that make API calls to the external databases to fetch the descriptions of the species name.
Since different species are listed in different databases, fetching their descriptions requires extra steps.
How the Tool Fetches Descriptions
The species URLs point to multiple external databases, where more detailed descriptions are available.
I have implemented three API handler files to retrieve description from specific set of databases using the provided URL.
When a user asks for a species annotation, the tool first retrieves the basic details (URL, Name, Qualifier).
Then, based on which database the species belongs to, the tool calls the appropriate API handler file to fetch its description.
Finally, all the information is displayed in an easy-to-read table.
Results Are Shown to the User is in the demo
The retrieved annotations are displayed in a scrollable table with the following columns:
What This Tool Can Do :
✅ Retrieve annotations for one or multiple species in a model.
✅ Fetch all species annotations in a given model.
✅ Remember the model ID from chat history, so users don’t need to enter it each time.
✅ Handle errors gracefully – If a species is not found or its description is missing, the user is notified in the front end.
Upcoming Feature Enhancements
This version only allows users to view species annotations, but future updates will add more features:
Include Databases like interpro and go
Users will be able to edit and update species annotations directly in the table.
Instead of asking for specific species by name, users will be able to make broader requests, such as:
"Show me annotations of all the InterLeukins in the model ."
Fixes #57 (issue)
Type of change
Please delete options that are not relevant.
How Has This Been Tested?
Please describe the tests you conducted to verify your changes. These may involve creating new test scripts or updating existing ones.
tests
foldertests/testX.py
)Checklist
tests
folder) that prove my fix is effective or that my feature worksFor reviewers
Checklist pre-approval
Checklist post-approval
develop
intomain
? If so, please make sure to add a prefix (feat/fix/chore) and/or a suffix BREAKING CHANGE (if it's a major release) to your commit message.Checklist post-merge
develop
intomain
and is it suppose to run an automated release workflow (if applicable)? If so, please make sure to check under the "Actions" tab to see if the workflow has been initiated, and return later to verify that it has completed successfully.