Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add tool for extracting annotions of species in an ODE model #81

Merged
merged 27 commits into from
Feb 4, 2025

Conversation

Rakesh-Seenu
Copy link
Contributor

@Rakesh-Seenu Rakesh-Seenu commented Jan 29, 2025

For authors

Video3.mp4

Description

This PR introduces a new tool in Talk2BioModels that helps retrieve species annotations from a model.
When the tool is prompted tool uses get_miriam_annotation function from basico library in the backend
to retrieve the annotations of the specified species.

When this function is called, it returns the following details:

  • Name – The species name as recorded in the database.
  • URL – A link to an external database where the species is listed.
  • Qualifier – A category or tag that describes the species.

However, the species name is often not very informative, Hence I have developed modules that make API calls to the external databases to fetch the descriptions of the species name.
Since different species are listed in different databases, fetching their descriptions requires extra steps.

How the Tool Fetches Descriptions

The species URLs point to multiple external databases, where more detailed descriptions are available.
I have implemented three API handler files to retrieve description from specific set of databases using the provided URL.

When a user asks for a species annotation, the tool first retrieves the basic details (URL, Name, Qualifier).
Then, based on which database the species belongs to, the tool calls the appropriate API handler file to fetch its description.
Finally, all the information is displayed in an easy-to-read table.

Results Are Shown to the User is in the demo

The retrieved annotations are displayed in a scrollable table with the following columns:

  • Species Name : The name of the species.
  • Description : A brief explanation of the species from the database.
  • Database : The name of the database where the species is listed.
  • ID : A unique identifier, shown as a clickable hyperlink that directs the user to the species database page.
  • Qualifier : Additional information about the species annotation.

What This Tool Can Do :

✅ Retrieve annotations for one or multiple species in a model.
✅ Fetch all species annotations in a given model.
✅ Remember the model ID from chat history, so users don’t need to enter it each time.
✅ Handle errors gracefully – If a species is not found or its description is missing, the user is notified in the front end.

Upcoming Feature Enhancements

This version only allows users to view species annotations, but future updates will add more features:

  • Adding more Databases for API Calling
    Include Databases like interpro and go
  • Editing & Updating Annotations
    Users will be able to edit and update species annotations directly in the table.
  • Support for Abstract Questions
    Instead of asking for specific species by name, users will be able to make broader requests, such as:
    "Show me annotations of all the InterLeukins in the model ."

Fixes #57 (issue)

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Please describe the tests you conducted to verify your changes. These may involve creating new test scripts or updating existing ones.

  • Added new test ( test_get_annotaion ) in the tests folder
  • Added new function(s) to an existing test(s) (e.g.: tests/testX.py)
  • No new tests added (Please explain the rationale in this case)

Checklist

  • My code follows the style guidelines mentioned in the Code/DevOps guides
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation (e.g. MkDocs)
  • My changes generate no new warnings
  • I have added or updated tests (in the tests folder) that prove my fix is effective or that my feature works
  • New and existing tests pass locally with my changes
  • Any dependent changes have been merged and published in downstream modules

For reviewers

Checklist pre-approval

  • Is there enough documentation?
  • If a new feature has been added, or a bug fixed, has a test been added to confirm good behavior?
  • Does the test(s) successfully test edge/corner cases?
  • Does the PR pass the tests? (if the repository has continuous integration)

Checklist post-approval

  • Does this PR merge develop into main? If so, please make sure to add a prefix (feat/fix/chore) and/or a suffix BREAKING CHANGE (if it's a major release) to your commit message.
  • Does this PR close an issue? If so, please make sure to descriptively close this issue when the PR is merged.

Checklist post-merge

  • When you approve of the PR, merge and close it (Read this article to know about different merge methods on GitHub)
  • Did this PR merge develop into main and is it suppose to run an automated release workflow (if applicable)? If so, please make sure to check under the "Actions" tab to see if the workflow has been initiated, and return later to verify that it has completed successfully.

@gurdeep330 gurdeep330 requested a review from dmccloskey January 29, 2025 21:54
@gurdeep330 gurdeep330 marked this pull request as ready for review January 29, 2025 21:54
@gurdeep330 gurdeep330 added enhancement New feature or request Talk2Biomodels labels Jan 29, 2025
Copy link
Member

@dmccloskey dmccloskey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work 💪. I am very pleased to see the thought that went into fetching additional information from different ontologies and databases. @awmulyadi I think the APIs could be useful for you for the enrichment part. At least chemicals, proteins, genes, GO, and diseases appear to be covered as many are included in OLS.

I have several comments in regard to the testing and how the multiple OLS sub databases are handled. Please reach out if you have any questions.

aiagents4pharma/talk2biomodels/api/kegg.py Outdated Show resolved Hide resolved
aiagents4pharma/talk2biomodels/api/kegg.py Outdated Show resolved Hide resolved
aiagents4pharma/talk2biomodels/tests/test_api.py Outdated Show resolved Hide resolved
current_state = app.get_state(config)
dic_annotations_data = current_state.values["dic_annotations_data"]
print (dic_annotations_data)
assert isinstance(dic_annotations_data, list)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good idea to test multiple models. However, simple testing that some data was created is not so rigorous. I would recommend the following on top of what you already have testing that the state data was created properly:

  1. Prior to this test, a simple test to ensure that the outputs of prepare_content_msg are as expected.

  2. I would use the expected string from prepare_content_msg for each of the different models for all species as the test case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the function of test_all_species which covers all the the expected outputs

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. However, I am still missing the test for prepare_content_msg and the comparison of the expected string produced by this method for all species (unless I somehow missed it).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reversed_messages = current_state.values["messages"][::-1]

# Covered all the use case for the expecetd sting on all the species
test_condition = False
for msg in reversed_messages:
    if isinstance(msg, ToolMessage) and msg.name == "get_annotation":
        print("ToolMessage Content:", msg.content)  # Debugging output
        if msg.artifact is None and ('ORI' in msg.content or
                                     "Successfully extracted annotations for the species"
                                     in msg.content or "Error" in msg.content):
            test_condition = True
            break

dic_annotations_data = current_state.values["dic_annotations_data"]

assert isinstance(dic_annotations_data, list),\
    f"Expected a list for model {model_id}, got {type(dic_annotations_data)}"
assert len(dic_annotations_data) > 0,\
    f"Expected species data for model {model_id}, but got empty list"
assert test_condition # Expected output is validated

Rather than using prepare_content_msg, I have added this code where it checks the expected and the tool message produced.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see.

I think this can be made much clearer with my suggestions above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have created new function test_prepare_content_msg() for checking expected messages

aiagents4pharma/talk2biomodels/tools/get_annotation.py Outdated Show resolved Hide resolved
"""
Process link to format it correctly.
"""
substrings = ["chebi/", "pato/", "pr/", "fma/", "sbo/"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit concerned this will be difficult to maintain as there are a LOT of different ontologies. What is the problem that this method solves in regard to link formatting? If it is needed, is it possible to make it more general? Another idea would be just to include all of the ontology abbreviations from OLS if it is the same for each of them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Problem faced is that, in Some cases the link don't work when get_miriam_annotation is called because they have database name in between the Link .

For example in model 537 for species sR

sR http://identifiers.org/pato/PATO:0001537

As the returned link is invalid, to make it a valid link i have use these substrings to remove unnecessary part in link.
yes we can include all of ontology terms, May be in next release as i don't know all the ontology abbreviations

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I remember running into this issue as well once...

All of the OLS abbreviations can be found here: https://www.ebi.ac.uk/ols4/ontologies.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the link

aiagents4pharma/talk2biomodels/tools/get_annotation.py Outdated Show resolved Hide resolved
Copy link
Member

@dmccloskey dmccloskey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The review updates are looking much better 👍. Just a few minor comments at this point.

term = "GO:ABC123"
label = fetch_from_ols(term)
assert label.startswith("Error: 404")
term_1 = "GO:0005886" #Negative result
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
term_1 = "GO:0005886" #Negative result
term_1 = "GO:0005886" #Positive result

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have corrected the comment

label = fetch_from_ols(term)
assert label.startswith("Error: 404")
term_1 = "GO:0005886" #Negative result
term_2 = "GO:ABC123" #Positive result
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
term_2 = "GO:ABC123" #Positive result
term_2 = "GO:ABC123" #Negative result

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have corrected the comment

@@ -266,10 +261,13 @@ def _fetch_descriptions(self, data: List[dict[str, str]]) -> dict[str, str]:

# In the following loop, we fetch the descriptions for the identifiers
# based on the database type.
# Constants
ols_ontology_abbreviations = {'pato', 'chebi', 'sbo', 'fma', 'pr'}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much cleaner with a named variable for the abbreviations 🙂. It looks like there are two places where this list is used. Would it be possible to make this a const global variable at the top of the file or in the init so that there is no replication?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have made it as a constant global variable


reversed_messages = current_state.values["messages"][::-1]

# Covered all the use case for the expecetd sting on all the species
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Covered all the use case for the expecetd sting on all the species
# Covers all of the use cases for the expected string on all the species

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated the comment

@Rakesh-Seenu
Copy link
Contributor Author

Hi, @dmccloskey making some more updates I will let you know, Could you please merge the pull request after making some more changes.

Copy link
Member

@dmccloskey dmccloskey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The updates look good 👍. Nice work 💪.

Before we can merge, please take care of the following:

  1. the merge conflicts
  2. open an issue for adding all of the OLS abbreviations

@Rakesh-Seenu
Copy link
Contributor Author

Hi @dmccloskey, I am still working on this PR on some updates , Please review the code only after you have heard from me. Thx

@gurdeep330 gurdeep330 changed the title Feat annot Feat: Add tool for extracting annotions of species in an ODE model Feb 3, 2025
@gurdeep330 gurdeep330 changed the title Feat: Add tool for extracting annotions of species in an ODE model feat: Add tool for extracting annotions of species in an ODE model Feb 3, 2025
@Rakesh-Seenu
Copy link
Contributor Author

The updates look good 👍. Nice work 💪.

Before we can merge, please take care of the following:

  1. the merge conflicts
  2. open an issue for adding all of the OLS abbreviations

Hi @dmccloskey,

Thank you for your feedback. I’ve addressed the items you mentioned and would like to provide a detailed update on the recent changes:

  • Handling the iOS Species Return Issue:
    During testing, I discovered that iOS was returning None when it should have been returning the species. I investigated the issue, implemented a fix to properly handle the case when the species needs to be returned, and added a new test case specifically for model 20. This update ensures that the behavior is now consistent and the error has been resolved.

In get_annotaion I have added below code :
image

In test_get_annotaion I have updated as follows :
image

  • Updating OLS Abbreviations for Model like 56:
    I noticed that the [go data base] abbreviation was missing in the OLS for model 56. I’ve now added this abbreviation, and as demonstrated in the attached screenshot, it is displaying correctly.
    image

In addition, I will open a separate issue to track the addition of all missing OLS abbreviations to ensure comprehensive coverage across models.

Additional Verifications:
I have run both pylint and coverage tests. The code meets our style guidelines, and the test coverage is complete with no conflicts, ensuring that all updates are in line with our quality standards.
image
image

Please let me know if you have any further questions or require additional changes :)

@gurdeep330 gurdeep330 requested a review from dmccloskey February 3, 2025 12:14
Copy link
Member

@dmccloskey dmccloskey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please integrate my suggestions (you may have to modify slightly), check that the tests and linting pass, and then merge 🙂.

@@ -229,7 +234,7 @@ def _process_link(self, link: str) -> str:
"""
Process link to format it correctly.
"""
substrings = ["chebi/", "pato/", "pr/", "fma/", "sbo/"]
substrings = ["chebi/", "pato/", "pr/", "fma/", "sbo/", "go/"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
substrings = ["chebi/", "pato/", "pr/", "fma/", "sbo/", "go/"]

@@ -229,7 +234,7 @@ def _process_link(self, link: str) -> str:
"""
Process link to format it correctly.
"""
substrings = ["chebi/", "pato/", "pr/", "fma/", "sbo/"]
substrings = ["chebi/", "pato/", "pr/", "fma/", "sbo/", "go/"]
for substring in substrings:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for substring in substrings:
for substring in ols_ontology_abbreviations:

@@ -229,7 +234,7 @@ def _process_link(self, link: str) -> str:
"""
Process link to format it correctly.
"""
substrings = ["chebi/", "pato/", "pr/", "fma/", "sbo/"]
substrings = ["chebi/", "pato/", "pr/", "fma/", "sbo/", "go/"]
for substring in substrings:
if substring in link:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if substring in link:
if substring + '/' in link:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @dmccloskey ,
I’ve made the suggested updates and verified that both pytest and linting have passed successfully.

Please let me know if there’s anything else you'd like me to address.

dmccloskey
dmccloskey previously approved these changes Feb 3, 2025
@dmccloskey dmccloskey dismissed their stale review February 3, 2025 16:21

Checks failed

@dmccloskey
Copy link
Member

@Rakesh-Seenu It looks like there are a few linting failures and some failing tests. Please take a look and ping me when they have been resolved.

@dmccloskey dmccloskey merged commit eb27e13 into VirtualPatientEngine:main Feb 4, 2025
3 of 6 checks passed
Copy link
Contributor

github-actions bot commented Feb 4, 2025

🎉 This PR is included in version 1.14.0 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

FEATURE: Annotation Tool
3 participants