Skip to content

Latest commit

 

History

History
327 lines (197 loc) · 30.3 KB

README.md

File metadata and controls

327 lines (197 loc) · 30.3 KB

Full Spectrum Bioinformatics

DOI NSF-1942647.

licensebuttons by-nc-sa
Authors: Jesse Zaneveld1, Nia Prabhu*1, Aziz Bajouri*1,2, Ayomikun Akinrinade*1,3, Dr. Mushtaq Bilal*4

* Chapter and Vignette authors contributed equally and are listed in chronological order of first contribution.
1 Division of Biological Sciences, School of STEM, University of Washington, Bothell, Washington, USA
2 Division of Computer and Software Systems, School of STEM, University of Washington, Bothell, Washington, USA
3 Division of Health Studies, School of Nursing and Health Studies, University of Washington, Bothell, Washington, USA
4

About the Project

Full Spectrum Bioinformatics is a free online text designed to introduce key topics in Bioinformatics using the Python programming language. The text is written in interactive Jupyter Notebooks, which allow you to try out and modify example code and analyses.

In addition to explanations of concepts, Full Spectrum Bioinformatics also includes Bioinformatics Vignettes written by readers of the text. Each vignette is focused around a particular core concept, and show how readers have applied that concepts to their research projects.

How to Read the Text

If you happen to already be familiar with GitHub and Jupyter Notebooks, you can download the entire project and run it interactively, or click the 'Open in Colab' links (they looks like this: Open In Colab) to open interactive versions of each section in Google Colab (you will need to 'Save as' your own copy in order to change code).

If you would just like to read a chapter, you can also view a static version of each section using the nbviewer links (they look like this: Open in nbviewer). nbviewer stands for 'notebook viewer', so this is just a way to view chapters with code in them without actually running the code. This will generally be the best way to view the chapters non-interactively.

Finally, you can also use the direct GitHub links (the link that's the name of each chapter) to view any chapeter. This shows the chapter on GitHub. It usually works well, but you may sometimes get a GitHub error message. Usually hitting reload page or using the Open in nbviewer link avoids this issue.

Table of Contents

The text is currently in prototype status. Chapters with content you can preview are linked below:

Foreword

Open in nbviewer Foreword

Chapter 1. Introduction

The Many Paths to Bioinformatics

An Absurdly Brief Introduction to Biology

An Absurdly Brief Introduction to Computer Science

An Absurdly Brief Introduction to Statistics

Chapter 2. The Command Line

Open In ColabOpen in nbviewer Using the Command Line

Open in nbviewer Exercise. Little Brother is Missing: practice navigating on the command line

Open in nbviewer Exercise. Duck vs. Yeast: using BLAST+ on the command line to detect sequence similarity

Chapter 3. Exploring Python

Warm-up Exercise: Spot the Difference

Open In Colab Open in nbviewer Exploring Python

Open In Colab Open in nbviewer A Tour of Python Data Types (ints, floats, boolean values, strings, lists, dicts, & sets)

Open In Colab Open in nbviewer A Tour of Python Syntax (functions, conditions, iteration, classes)

Open In Colab Open in nbviewer A Quick Win: using Python to run Statistical Tests and Make Simple Graphs

Open In Colab Open in nbviewer Another Quick Win: Loading tabular data with Pandas DataFrames

Chapter 4. Project Design

Open In Colab Open in nbviewer Using Literature Surveys to Ask Good Questions and Propose Testable Hypotheses

Open In Colab Open in nbviewer Write a Literature Synthesis...and get your Introduction for free!

Open In Colab Open in nbviewer Zotero for Beginners (a.k.a How to Avoid Repeatedly Reformatting 96 Citations by Hand)

Chapter 5. Biological Sequences

Open In Colab Open in nbviewer An introduction to Biological Sequences

Open In Colab Open in nbviewerRepresenting and Manipulating Biological Sequences as Python Strings

Open In Colab Open in nbviewer Analyzing Biological Sequences with For Loops and If Statements

Open In Colab Open in nbviewer Reading and writing FASTA files using Python

Open In Colab Open in nbviewer Vignette (Aziz Bajouri): Using set objects to find circular RNAs involved in multiple diseases

Open In ColabOpen in nbviewer Exercise: Error Bingo

Open In Colab Open in nbviewer Error Messages in Python

Open In Colab Open in nbviewer Vignette (Nia Prabhu): Using For Loops and Dictionaries to Compare Nucleotide Composition in Pandemic and Non-Pandemic Causing Influenza Strains

Open In Colab Open in nbviewerCapstone: testing for depletion of CG dinucleotides in the human genome

Chapter 6. 'Omics

Open In Colab Open in nbviewer An Introduction to 'Omics

Open In Colab Open in nbviewer Working with Tabular 'Omic data in Python using Pandas

Open In Colab Open in nbviewer Joining and Filtering Pandas DataFrames

Open In Colab Open in nbviewer Pandas Case Study: Analyzing tabular sleep data from the NHANES healthy survey

Analyzing Microbiome Alpha Diversity in Python

Analyzing Microbiome Beta Diversity in Python

Open In Colab Open in nbviewer Simulating the Effect of Sequencing Depth on Diversity Estimates

Chapter 7. Project Organization Revisited: Leveling up your Process

Reflecting on your Project so Far

Project Organization Strategies for Collaborative and Reproducible Research

Test Code: a powerful strategy for ensuring your results aren't lies.

Chapter 8. Visualization

Graphs as a Visual Language

Open In ColabOpen in nbviewer Exercise: Anger Tufte

Open In Colab Open in nbviewer Representing Correlation

Representing Distribution

Chapter 9. Alignment and Phylogenetics

Part 1. Alignment

Homology and Alignment

Open In Colab Open in nbviewer Global Alignment with the Needleman-Wunsch algorithm

Local Alignment with the Smith-Waterman algorithm

BLAST and the k-mer trick

Part 2. Phylogenetics

Tree thinking

Open In Colab Open in nbviewer Representing Phylogenetic Trees with Python Classes

Open In Colab Open in nbviewer Generating Trees Using Birth-Death Models

Working with Traits on Trees

Maximum Parsimony Ancestral State Reconstruction

Phylogenetic Comparative Methods

Trait prediction

Chapter 10. Simulation

Open In Colab Open in nbviewerSimulating the Population Genetics of Natural Selection and Genetic Drift

Simulating Networks

Simulating the Evolution of Social Behavior

Chapter 11. Statistics

Open In Colab Open in nbviewer
Linear Models - a Statistical Swiss Army Knife

Open In Colab Open in nbviewer Monte Carlo simulation and the Fundamental Unity of Statistical Hypothesis Tests

Statistical Distributions and Parametric Tests

Open In Colab Open in nbviewer Rank Transformations

Open In Colab Open in nbviewer Monte Carlo simulation of Effect Size, Sample Size, and Significance

Open In Colab Open in nbviewer Dealing with Multiple Comparisons

Open In Colab Open in nbviewer Exercise: Revising your writing about statistical results

An Introduction to Maximum Likelihood optimization

The Best Model of A Cat is a Cat - model complexity, overfitting, and the AIC

An Introduction to Bayesian Approaches

Chapter 12. Multivariate Statistics and Machine Learning

Unsupervised Classification: of ordination, clustering and fishtanks

Supervised Classification: from lines to trees to forests.

Open In ColabOpen in nbviewer Vignette (Ayomikun Akinrinade): Using K-Nearest Neighbors and Binary Decision Tree Algorithms to Predict Enzyme Function from Protein Sequences

Chapter 13. Presenting Research

Open In Colab Open in nbviewer Presenting Research

Chapter 14. Polishing and Publishing

From Data to Conclusion: building a research manuscript brick by brick

Open In Colab Open in nbviewerResistance is Futile: becoming a language Borg

Exercise: generating a targeted title using templating

The Inverted Pyramid: optimizing your text from a reader's perspective

Chapter 15. Careers that draw on Bioinformatics

Fighting for an Inclusive WorkplaceOpen in nbviewer

Best practices for Success: Happiness Matters, Radical Collaboration, and Networking

Open-source Science as Shield and Sword

Open In Colab Applying for Grants Open in nbviewer Applying for Grants

Appendices

Appendix A

Open in nbviewer Data Sources for Bioinformatics Projects

Appendix B

Timesaving Starter Code Template Script with Interface and Test Code IUPAC codes in python Standard Translation Tables in Python

Appendix C - Contributing a Vignette

Appendix D - Paper Formatting Kit

Appendix E - Project Specifications

Acknowledgements

This project is being developed with support from NSF Integrative and Organismal Systems award NSF-1942647.

Feedback

You can submit feedback about completed chapters at the following link