Update notebooks, add github workflow

bioinfodlsu · Jan 10, 2025 · 9ef22a9 · 9ef22a9
1 parent 94a31f1
commit 9ef22a9
Show file tree

Hide file tree

Showing 8 changed files with 154 additions and 14 deletions.
diff --git a/.github/workflows/build-and-deploy.yml b/.github/workflows/build-and-deploy.yml
@@ -0,0 +1,37 @@
+name: Build and deploy Jupyter book
+on:
+  push:
+    branches:
+    - main
+
+jobs:
+  build-book:
+    runs-on: ubuntu-latest
+    permissions:
+      pages: write
+      id-token: write
+    environment:
+      name: github-pages
+    steps:
+    - name: Checkout repository
+      uses: actions/checkout@v3
+
+    - name: Install Python
+      uses: actions/setup-python@v5
+      with:
+        python-version: "3.10"
+
+    - name: Install Jupyter Book
+      run: pip install jupyter-book
+
+    - name: Build the book
+      run: jupyter-book build .
+
+    - name: Package the HTML files into an artifact for GitHub Pages
+      uses: actions/upload-pages-artifact@v3
+      with:
+        path: "_build/html"
+
+    - name: Deploy to GitHub Pages
+      id: deployment
+      uses: actions/deploy-pages@v4
diff --git a/data-generation.md b/data-generation.md
@@ -1,10 +1,11 @@
 # Data generation process
 
-## From sample to sequences
+## From samples to sequences
 
 ## How the statistician sees it
-
+Metagenomics data is compositional data.
+Most analysis will be based on relative abundance.
 
 ## Discussion
-What kind of data are you working on?
+Are you planning to use metagenomics for your study?
 What is the experiment design?
diff --git a/data-sneak-peek.ipynb b/data-sneak-peek.ipynb
@@ -23,16 +23,18 @@
     "| Finland | FH4 | ERR7015398 |\n",
     "| Benin   | BH1 | ERR7015311 |\n",
     "| Benin   | BH2 | ERR7015312 |\n",
-    "| Benin | BH3 | ERR7015313 |"
+    "| Benin | BH3 | ERR7015313 |\n",
+    "| Benin | BH4 | ERR7015315 |"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "80c4431c-d848-4f05-bbe7-39346e72f20a",
    "metadata": {},
    "source": [
-    "You could obtain the dataset by doing a wget. \n",
-    "But for the purposes of the workshop, we will downsample each set to 1 million paired reads"
+    "You can obtain the dataset by doing a wget. \n",
+    "\n",
+    "For the purposes of this workshop, we will  downsample each set to 1 million paired reads."
    ]
   }
  ],
@@ -52,7 +54,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.12.5"
+   "version": "3.12.8"
   }
  },
  "nbformat": 4,

diff --git a/intro.md b/intro.md
@@ -1,7 +1,6 @@
-# The Data Science of Metagenomics : Bioinformatics Workflows for Profiling Microbial Communities
-
+<!-- # The Data Science of Metagenomics : Bioinformatics Workflows for Profiling Microbial Communities -->
+# The Bioinformatics of Profiling Microbial Communities in Wastewater from Shotgun Metagenome Samples 
 ## Workshop overview
-%
 Metagenomics is the application of high-throughput sequencing to DNA extracted directly from environmental, uncultured samples. 
 With sequencing becoming more accessible to labs in the Philippines, we are witnessing increasing use of metagenomics to study microbial communities in environmental, ecological, agricultural, epidemiological, and clinical settings.
 The power of metagenomics, however, comes with the challenging task of handling and analyzing large volumes of sequence data.

diff --git a/metaphlan.ipynb b/metaphlan.ipynb
@@ -0,0 +1,101 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "29db0336-059d-4e5d-aa87-85c669e06f05",
+   "metadata": {},
+   "source": [
+    "Let's use a toy database, since it takes a while to download the entire db. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "48e11d21-963f-42a1-980f-24db3153c3dc",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "metaphlan --install --index mpa_vJan21_TOY_CHOCOPhlAnSGB_202103_bt2.tar --bowtie2db .\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "32384307-6a99-4534-8736-5e3e4cf282f5",
+   "metadata": {},
+   "source": [
+    "http://cmprod1.cibio.unitn.it/biobakery4/metaphlan_databases/bowtie2_indexes/\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "a976a194-73ac-4c7c-aaf9-425da7b6622a",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "metaphlan FH1_1.fastq.gz,FH1_2.fastq.gz --bowtie2db ../../metaphlan_databases/ --index mpa_vJan21_TOY_CHOCOPhlAnSGB_202103 --input_type fastq --bowtie2out FH1.bowtie2.bz2 --output FH1_profile"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5c3074ae-cb1b-4661-9520-94e551a06ab9",
+   "metadata": {},
+   "source": [
+    "-1 -2 doesn't seem to work"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "fbb89ab7-384d-4d65-91e7-a075cfeca294",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "antibiotic-resistance.md   going-beyond.md  notebooks-jb-website.ipynb\n",
+      "_build\t\t\t   intro.md\t    README.md\n",
+      "_config.yml\t\t   lab_logo.png     references.bib\n",
+      "data-analysis-overview.md  LICENSE\t    requirements.txt\n",
+      "data-generation.md\t   metaphlan.ipynb  taxonomic-profiling.md\n",
+      "data-sneak-peek.ipynb\t   notebooks\t    testing.ipynb\n",
+      "Dockerfile\t\t   notebooks.ipynb  _toc.yml\n"
+     ]
+    }
+   ],
+   "source": [
+    "! ls\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "a656588e-f36a-41cf-978e-fa1a17e4d747",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.12.8"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/notebooks-jb-website.ipynb b/notebooks-jb-website.ipynb
@@ -362,7 +362,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.13.0"
+   "version": "3.12.8"
   },
   "toc": {
    "base_numbering": 1,

diff --git a/notebooks.ipynb b/notebooks.ipynb
@@ -129,7 +129,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.13.0"
+   "version": "3.12.8"
   },
   "widgets": {
    "application/vnd.jupyter.widget-state+json": {

diff --git a/taxonomic-profiling.md b/taxonomic-profiling.md
@@ -1,7 +1,7 @@
 # Taxonomic Profiling
-One typical task is to figure out the taxonomic composition of the sampled microbial community, i.e. to find out what species are present and in what proportion. 
+A typical task is to figure out the taxonomic composition of the sampled microbial community, i.e. to find out what species are present and in what proportion. 
 
-The basic idea is to assign each read (or read pair) to a sequence in a reference database
+There are many tools available for this task. The basic idea behind all of them to assign reads to  sequences in a reference database, and then count the number of reads to compute relative abundance. Broadly speaking, there are two choices for the reference database. 
 
 ## Using comprehensive whole-genome databases