Skip to content

IC-Computational-Biology-Society/NCBI_text_mining_session

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 

Repository files navigation

Text mining NCBI Pubmed using Entrez

In this session, we will cover some basic features from the Entrez module embedded within the BioPython package. The session will introduce you to scripting automated searches of the NCBI Pubmed database as well as some approaches to exploring rudimentary ways of analysing text data. Anyone who has attended both of our previous Python workshops will have all the necessary background to complete this session. If you have not been able to make our previous Python sessions, all the Jupyter notebooks from them are posted on repositories within the IC-Computational-Biology-Society organisation.

NB: This session does not cover natural language processing or topics in machine learning. Nevertheless, it should give you the foundation to begin an investigation that culminates in the use of dedicated Python packages, such as NLTK. By the end of the session, you should be able to construct your own dataset of NCBI Pubmed text data on which to (potentially) start training machine learning models.

If you are attending our virtual interactive session on Microsoft Teams, please make sure you can run Anaconda, which can be easily obtained from Imperial College's AppsAnywhere platform or from the offical Anaconda website (only recommended if you cannot access AppsAnywhere or are completing the tutorial outside of the scheduled session).

Details of use

This tutorial is intended for educational use. If you would like to use any material herein for teaching or ulterior purposes outside the remit of the Imperial College Computational Biology Society, please contact the referenced authors.

Author

Joseph I. J. Ellaway

Email: [email protected]

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published