Pre-requisite for this workshop: The Basic Data Skills Introduction to the command-line interface workshop or a working knowledge of the command line and cluster computing.
Time | Topic | Instructor |
---|---|---|
09:30 - 09:45 | Workshop Introduction | Radhika |
09:45 - 10:25 | Working in an HPC environment - Review | Radhika |
10:25 - 11:05 | Project Organization (using Data Management best practices) | Mary |
11:05 - 11:45 | Quality Control of Sequence Data: Running FASTQC | Jihe |
11:45 - 12:00 | Overview of self-learning materials and homework submission | Jihe |
- Please study the contents and work through all the code within the following lessons:
-
Quality Control of Sequence Data: Running FASTQC on multiple samples
-
Quality Control of Sequence Data: Evaluating FASTQC reports
NOTE: To run through the code above, you will need to be logged into O2 and working on a compute node (i.e. your command prompt should have the word
compute
in it).- Log in using
ssh [email protected]
and enter your password (replace the "XX" in the username with the number you were assigned in class). - Once you are on the login node, use
srun --pty -p interactive -t 0-2:30 --mem 1G /bin/bash
to get on a compute node or as specified in the lesson. - Proceed only once your command prompt has the word
compute
in it. - If you log out between lessons (using the
exit
command twice), please follow points 1. and 2. above to log back in and get on a compute node when you restart with the self learning.
- Log in using
- Complete the exercises:
- Each lesson above contain exercises; please go through each of them.
- Copy over your code from the exercises into a text file.
- Upload the saved text file to Dropbox the day before the next class.
- If you get stuck due to an error while runnning code in the lesson, email us
- Post any conceptual questions that you would like to have reviewed in class here.
Time | Topic | Instructor |
---|---|---|
09:30 - 10:30 | Self-learning lessons review | All |
10:30 - 11:10 | Sequence Alignment Theory | Radhika |
11:10 - 11:50 | Quantifying expression using alignment-free methods (Salmon) | Mary |
11:50 - 12:00 | Review of workflow | Radhika |
- Please study the contents and work through all the code within the following lessons:
-
Quantifying expression using alignment-free methods (Salmon on multiple samples)
-
Documenting Steps in the Workflow with MultiQC
NOTE: To run through the code above, you will need to be logged into O2 and working on a compute node (i.e. your command prompt should have the word
compute
in it).- Log in using
ssh [email protected]
and enter your password (replace the "XX" in the username with the number you were assigned in class). - Once you are on the login node, use
srun --pty -p interactive -t 0-2:30 --mem 8G /bin/bash
to get on a compute node or as specified in the lesson. - Proceed only once your command prompt has the word
compute
in it. - If you log out between lessons (using the
exit
command twice), please follow points 1. and 2. above to log back in and get on a compute node when you restart with the self learning.
- Log in using
- Complete the exercises:
- Each lesson above contain exercises; please go through each of them.
- Copy over your code from the exercises into a text file.
- Upload the saved text file to Dropbox the day before the next class.
- If you get stuck due to an error while runnning code in the lesson, email us
- Post any conceptual questions that you would like to have reviewed in class here.
Time | Topic | Instructor |
---|---|---|
09:30 - 10:10 | Self-learning lessons review | All |
10:10 - 10:45 | Troubleshooting RNA-seq Data Analysis | Mary |
10:45 - 11:45 | Automating the RNA-seq workflow | Radhika |
11:45 - 12:00 | Wrap up | Radhika |
-
Downloadable Answer Keys (Day 2 exercises):
-
Downloadable Answer Keys (Day 3 exercises):
- Video about statistics behind salmon quantification
- Advanced bash for working on O2:
- Obtaining reference genomes or transcriptomes
- Introduction to R workshop materials
- Introduction to Differential Gene Expression analysis (bulk RNA-seq) workshop materials
These materials have been developed by members of the teaching team at the Harvard Chan Bioinformatics Core (HBC). These are open access materials distributed under the terms of the Creative Commons Attribution license (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.