Skip to content

Commit

Permalink
Merge branch 'experiments' of https://github.com/ur-whitelab/md-agent
Browse files Browse the repository at this point in the history
…into experiments
  • Loading branch information
qcampbel committed Oct 2, 2024
2 parents d9beb6a + 97efcdb commit d6b2d22
Show file tree
Hide file tree
Showing 13 changed files with 5,436 additions and 0 deletions.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
@@ -0,0 +1,255 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import datetime\n",
"import os\n",
"from mdagent import MDAgent\n",
"import matplotlib.pyplot as plt"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"date and time: 2024-09-27\n",
"time: 10:10:13\n",
"LLM: gpt-3.5-turbo-0125 \n",
"Temperature: 0.1\n"
]
}
],
"source": [
"prompt3 = \"Download the PDB file for protein 1GZX. Then, analyze the secondary structure of \\\n",
" the protein and provide information on how many helices, sheets, and other components are present. Get the gene names for this protein.\"\n",
"llm_var = \"gpt-3.5-turbo-0125\"\n",
"tools = \"all\"\n",
"agent = MDAgent(agent_type=\"Structured\", model=llm_var, top_k_tools=tools,ckpt_dir='ckpt_70')\n",
"now = datetime.datetime.now()\n",
"date = now.strftime(\"%Y-%m-%d\")\n",
"print(\"date and time:\",date)\n",
"time = now.strftime(\"%H:%M:%S\")\n",
"print(\"time:\",time)\n",
"print(\"LLM: \",agent.llm.model_name,\"\\nTemperature: \",agent.llm.temperature)"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Thought: To address the question, I need to download the PDB file for protein 1GZX, analyze its secondary structure to determine the number of helices, sheets, and other components present, and retrieve the gene names associated with this protein.\n",
"\n",
"Action:\n",
"```\n",
"{\n",
" \"action\": \"PDBFileDownloader\",\n",
" \"action_input\": {\n",
" \"query\": \"1GZX\"\n",
" }\n",
"}\n",
"```PDB file found with this ID: 1GZX\n",
"Thought: Now that I have successfully downloaded the PDB file for protein 1GZX, the next step is to analyze its secondary structure to determine the number of helices, sheets, and other components present. After that, I will retrieve the gene names associated with this protein.\n",
"\n",
"Action:\n",
"```\n",
"{\n",
" \"action\": \"ComputeDSSP\",\n",
" \"action_input\": {\n",
" \"traj_file\": \"1GZX_101015\",\n",
" \"top_file\": null,\n",
" \"target_frames\": \"all\"\n",
" }\n",
"}\n",
"```Action:\n",
"```\n",
"{\n",
" \"action\": \"GetGeneNames\",\n",
" \"action_input\": {\n",
" \"query\": \"1GZX\"\n",
" }\n",
"}\n",
"```Final Answer: \n",
"The analysis of protein 1GZX reveals the following secondary structure components:\n",
"- Helices: 444 residues\n",
"- Strands: 0 residues\n",
"- Coils: 130 residues\n",
"- Unassigned residues or non-protein residues: 213 residues\n",
"\n",
"The gene names associated with protein 1GZX are:\n",
"- HBB\n",
"- HBA1\n",
"- HBA2\n",
"\n",
"These are the results based on the analysis of the PDB file and gene names retrieval for protein 1GZX."
]
},
{
"data": {
"text/plain": [
"({'input': '\\n You are an expert molecular dynamics scientist, and\\n your task is to respond to the question or\\n solve the problem to the best of your ability using\\n the provided tools.\\n\\n You can only respond with a single complete\\n \\'Thought, Action, Action Input\\' format\\n OR a single \\'Final Answer\\' format.\\n\\n Complete format:\\n Thought: (reflect on your progress and decide what to do next)\\n Action:\\n ```\\n {\\n \"action\": (the action name, it should be the name of a tool),\\n \"action_input\": (the input string for the action)\\n }\\n \\'\\'\\'\\n\\n OR\\n\\n Final Answer: (the final response to the original input\\n question, once all steps are complete)\\n\\n You are required to use the tools provided,\\n using the most specific tool\\n available for each action.\\n Your final answer should contain all information\\n necessary to answer the question and its subquestions.\\n Before you finish, reflect on your progress and make\\n sure you have addressed the question in its entirety.\\n\\n If you are asked to continue\\n or reference previous runs,\\n the context will be provided to you.\\n If context is provided, you should assume\\n you are continuing a chat.\\n\\n Here is the input:\\n Previous Context: None\\n Question: Download the PDB file for protein 1GZX. Then, analyze the secondary structure of the protein and provide information on how many helices, sheets, and other components are present. Get the gene names for this protein. ',\n",
" 'output': 'Final Answer: \\nThe analysis of protein 1GZX reveals the following secondary structure components:\\n- Helices: 444 residues\\n- Strands: 0 residues\\n- Coils: 130 residues\\n- Unassigned residues or non-protein residues: 213 residues\\n\\nThe gene names associated with protein 1GZX are:\\n- HBB\\n- HBA1\\n- HBA2\\n\\nThese are the results based on the analysis of the PDB file and gene names retrieval for protein 1GZX.'},\n",
" '8J6BS6JZ')"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"agent.run(prompt3)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Final Answer\n",
"\n",
"Action:\n",
"{\n",
" \"action\": \"Final Answer\",\n",
" \"action_input\": \"The secondary structure analysis of protein 1GZX reveals the following components: \\n- Helices: 444 residues \\n- Strands: 0 residues \\n- Coils: 130 residues \\n- Unassigned or non-protein residues: 213 residues\"\n",
"}\n",
"\n",
"\n",
"Checkpint directory: /gpfs/fs2/scratch/jmedina9/mdagent/md-agent/ckpt/ckpt_70"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"date and time: 2024-09-10\n",
"time: 10:13:19\n"
]
}
],
"source": [
"now = datetime.datetime.now()\n",
"date = now.strftime(\"%Y-%m-%d\")\n",
"print(\"date and time:\",date)\n",
"time = now.strftime(\"%H:%M:%S\")\n",
"print(\"time:\",time)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Files found in registry: 1GZX_173238: PDB file downloaded from RSCB\n",
" PDBFile ID: 1GZX_173238\n",
" rec0_173240: dssp values for trajectory with id: 1GZX_173238\n"
]
}
],
"source": [
"registry = agent.path_registry\n",
"print(('\\n').join(registry.list_path_names_and_descriptions().split(',')))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Number of chains: 12\n",
"Number of sheets: 0\n",
"Number of helices: 444\n",
"Number of coils: 130\n"
]
}
],
"source": [
"import mdtraj as md\n",
"path = registry.get_mapped_path(\"1GZX_173238\")\n",
"path = registry.ckpt_dir + path.split('ckpt_70')[-1]\n",
"traj = md.load(path)\n",
"top = traj.topology\n",
"number_of_chains = top.n_chains\n",
"secondary_structure = md.compute_dssp(traj,simplified=True)\n",
"print(\"Number of chains: \",number_of_chains)\n",
"print(\"Number of sheets: \",len([i for i in secondary_structure[0] if i == 'E']))\n",
"print(\"Number of helices: \",len([i for i in secondary_structure[0] if i == 'H']))\n",
"print(\"Number of coils: \",len([i for i in secondary_structure[0] if i == 'C']))\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Experiment Result:\n",
"### Completed without Exception or TimeOut Errors ✅\n",
"### Attempted all necessary steps ✅\n",
"### Logic make sense ✅\n",
"### No hallucination ✅\n",
"### Correct Answer ✅\n",
"### Number of steps completed: 2\n",
"### % steps completed: 100%\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "mdagent2",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.9"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Loading

0 comments on commit d6b2d22

Please sign in to comment.