Skip to content

Commit

Permalink
Update notebooks, add github workflow
Browse files Browse the repository at this point in the history
  • Loading branch information
anishmss committed Jan 10, 2025
1 parent 94a31f1 commit 9ef22a9
Show file tree
Hide file tree
Showing 8 changed files with 154 additions and 14 deletions.
37 changes: 37 additions & 0 deletions .github/workflows/build-and-deploy.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
name: Build and deploy Jupyter book
on:
push:
branches:
- main

jobs:
build-book:
runs-on: ubuntu-latest
permissions:
pages: write
id-token: write
environment:
name: github-pages
steps:
- name: Checkout repository
uses: actions/checkout@v3

- name: Install Python
uses: actions/setup-python@v5
with:
python-version: "3.10"

- name: Install Jupyter Book
run: pip install jupyter-book

- name: Build the book
run: jupyter-book build .

- name: Package the HTML files into an artifact for GitHub Pages
uses: actions/upload-pages-artifact@v3
with:
path: "_build/html"

- name: Deploy to GitHub Pages
id: deployment
uses: actions/deploy-pages@v4
7 changes: 4 additions & 3 deletions data-generation.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,11 @@
# Data generation process

## From sample to sequences
## From samples to sequences

## How the statistician sees it

Metagenomics data is compositional data.
Most analysis will be based on relative abundance.

## Discussion
What kind of data are you working on?
Are you planning to use metagenomics for your study?
What is the experiment design?
10 changes: 6 additions & 4 deletions data-sneak-peek.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -23,16 +23,18 @@
"| Finland | FH4 | ERR7015398 |\n",
"| Benin | BH1 | ERR7015311 |\n",
"| Benin | BH2 | ERR7015312 |\n",
"| Benin | BH3 | ERR7015313 |"
"| Benin | BH3 | ERR7015313 |\n",
"| Benin | BH4 | ERR7015315 |"
]
},
{
"cell_type": "markdown",
"id": "80c4431c-d848-4f05-bbe7-39346e72f20a",
"metadata": {},
"source": [
"You could obtain the dataset by doing a wget. \n",
"But for the purposes of the workshop, we will downsample each set to 1 million paired reads"
"You can obtain the dataset by doing a wget. \n",
"\n",
"For the purposes of this workshop, we will downsample each set to 1 million paired reads."
]
}
],
Expand All @@ -52,7 +54,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.5"
"version": "3.12.8"
}
},
"nbformat": 4,
Expand Down
5 changes: 2 additions & 3 deletions intro.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
# The Data Science of Metagenomics : Bioinformatics Workflows for Profiling Microbial Communities

<!-- # The Data Science of Metagenomics : Bioinformatics Workflows for Profiling Microbial Communities -->
# The Bioinformatics of Profiling Microbial Communities in Wastewater from Shotgun Metagenome Samples
## Workshop overview
%
Metagenomics is the application of high-throughput sequencing to DNA extracted directly from environmental, uncultured samples.
With sequencing becoming more accessible to labs in the Philippines, we are witnessing increasing use of metagenomics to study microbial communities in environmental, ecological, agricultural, epidemiological, and clinical settings.
The power of metagenomics, however, comes with the challenging task of handling and analyzing large volumes of sequence data.
Expand Down
101 changes: 101 additions & 0 deletions metaphlan.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "29db0336-059d-4e5d-aa87-85c669e06f05",
"metadata": {},
"source": [
"Let's use a toy database, since it takes a while to download the entire db. "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "48e11d21-963f-42a1-980f-24db3153c3dc",
"metadata": {},
"outputs": [],
"source": [
"metaphlan --install --index mpa_vJan21_TOY_CHOCOPhlAnSGB_202103_bt2.tar --bowtie2db .\n"
]
},
{
"cell_type": "markdown",
"id": "32384307-6a99-4534-8736-5e3e4cf282f5",
"metadata": {},
"source": [
"http://cmprod1.cibio.unitn.it/biobakery4/metaphlan_databases/bowtie2_indexes/\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a976a194-73ac-4c7c-aaf9-425da7b6622a",
"metadata": {},
"outputs": [],
"source": [
"metaphlan FH1_1.fastq.gz,FH1_2.fastq.gz --bowtie2db ../../metaphlan_databases/ --index mpa_vJan21_TOY_CHOCOPhlAnSGB_202103 --input_type fastq --bowtie2out FH1.bowtie2.bz2 --output FH1_profile"
]
},
{
"cell_type": "markdown",
"id": "5c3074ae-cb1b-4661-9520-94e551a06ab9",
"metadata": {},
"source": [
"-1 -2 doesn't seem to work"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "fbb89ab7-384d-4d65-91e7-a075cfeca294",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"antibiotic-resistance.md going-beyond.md notebooks-jb-website.ipynb\n",
"_build\t\t\t intro.md\t README.md\n",
"_config.yml\t\t lab_logo.png references.bib\n",
"data-analysis-overview.md LICENSE\t requirements.txt\n",
"data-generation.md\t metaphlan.ipynb taxonomic-profiling.md\n",
"data-sneak-peek.ipynb\t notebooks\t testing.ipynb\n",
"Dockerfile\t\t notebooks.ipynb _toc.yml\n"
]
}
],
"source": [
"! ls\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a656588e-f36a-41cf-978e-fa1a17e4d747",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.8"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
2 changes: 1 addition & 1 deletion notebooks-jb-website.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -362,7 +362,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.13.0"
"version": "3.12.8"
},
"toc": {
"base_numbering": 1,
Expand Down
2 changes: 1 addition & 1 deletion notebooks.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -129,7 +129,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.13.0"
"version": "3.12.8"
},
"widgets": {
"application/vnd.jupyter.widget-state+json": {
Expand Down
4 changes: 2 additions & 2 deletions taxonomic-profiling.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Taxonomic Profiling
One typical task is to figure out the taxonomic composition of the sampled microbial community, i.e. to find out what species are present and in what proportion.
A typical task is to figure out the taxonomic composition of the sampled microbial community, i.e. to find out what species are present and in what proportion.

The basic idea is to assign each read (or read pair) to a sequence in a reference database
There are many tools available for this task. The basic idea behind all of them to assign reads to sequences in a reference database, and then count the number of reads to compute relative abundance. Broadly speaking, there are two choices for the reference database.

## Using comprehensive whole-genome databases

Expand Down

0 comments on commit 9ef22a9

Please sign in to comment.