-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathlectures.md-BCK
758 lines (508 loc) · 54.4 KB
/
lectures.md-BCK
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
---
layout: page
title: Lectures
---
All Lectures are Wed/Fri 1:00-4:00 pm in TATA 2501
(<a href="https://goo.gl/maps/Cd8z9Zexx6q">Map</a>). Clicking on the
class topics below will take you to corresponding lecture notes,
homework assignments, pre-class video screen-casts and required reading
material.
<br>
| \# | Date | Topics for Winter 2020 |
| :-: | :----------- | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| 1 | Wed 01/08/20 | [**Welcome to Bioinformatics**](#1) <br> Course introduction, Leaning goals & expectations, Biology is an information science, History of Bioinformatics, Types of data, Application areas and introduction to upcoming course segments, Hands on with major Bioinformatics databases and key online NCBI and EBI resources |
| 2 | Fri 01/10/20 | [**Sequence alignment fundamentals, algorithms and applications**](#2) <br> Homology, Sequence similarity, Local and global alignment, classic Needleman-Wunsch, Smith-Waterman and BLAST heuristic approaches, Hands on with dot plots, Needleman-Wunsch and BLAST algorithms highlighting their utility and limitations |
| 3 | Wed 01/15/20 | [**Advanced sequence alignment and database searching**](#3) <br> Detecting remote sequence similarity, Database searching beyond BLAST, Substitution matrices, Using PSI-BLAST, Profiles and HMMs, Protein structure comparisons |
| 4 | Fri 01/17/20 | [**Bioinformatics data analysis with R**](#4) <br> Why do we use R for bioinformatics? R language basics and the RStudio IDE, Major R data structures and functions, Using R interactively from the RStudio console |
| 5 | Wed 01/22/20 | [**Data exploration and visualization in R**](#5) <br> The exploratory data analysis mindset, Data visualization best practices, Using and customizing base graphics (scatterplots, histograms, bar graphs and boxplots), Building more complex charts with ggplot and rgl |
| 6 | Fri 01/24/20 | [**Why, when and how of writing your own R functions**](#6) <br> The basics of writing your own functions that promote code robustness, reduce duplication and facilitate code re-use |
| 7 | Wed 01/29/20 | [**Guest Lecture**: Epigenetics and 3D genome organization](#7) <br> Guest lecture from [Ferhat Ay (LJI)](https://www.lji.org/faculty-research/labs/ay/#overview) introducing epigenetics research and supporting bioinformatics methods and tools. |
| 8 | Fri 01/31/20 | [**Bioinformatics R packages from CRAN and BioConductor**](#8) <br> Extending functionality and utility with R packages, Obtaining R packages from CRAN and BioConductor, Working with Bio3D for molecular data |
| 9 | Wed 02/05/20 | [**Introduction to machine learning for Bioinformatics 1**](#9) <br> Unsupervised learning, K-means clustering, Hierarchical clustering, Heatmap representations. Dimensionality reduction, Principal Component Analysis (PCA) |
| 10 | Fri 02/07/20 | [**Unsupervised learning mini-project**](#10) <br> Longer hands-on session with unsupervised learning analysis of cancer cells further highlighting Practical considerations and best practices for the analysis and visualization of high dimensional datasets |
| 11 | Wed 02/12/20 | **Project:** [**Find a gene assignment (Part 1)**](#11) <br> Principles of database searching, sequence analysis, structure analysis along with [**Hands-on with Git**](#10) <br> How to perform common operations with the Git version control system. We will also cover the popular social code-hosting platforms GitHub and BitBucket. |
| 12 | Fri 02/14/20 | [**Structural Bioinformatics (Part 1)**](#12) <br> Protein structure function relationships, Protein structure and visualization resources, Modeling energy as a function of structure |
| 13 | Wed 02/19/20 | [**Bioinformatics in drug discovery and design**](#13) <br> Target identification, Lead identification, Small molecule docking methods, Protein motion and conformational variants, Molecular simulation and drug optimization |
| 14 | Fri 02/21/20 | [**Genome informatics and high throughput sequencing (Part 1)**](#14) <br> Genome sequencing technologies past, present and future; Biological applications of sequencing, Analysis of variation in the genome, and gene expression; The Galaxy platform along with resources from the EBI & UCSC; Sample Galaxy RNA-Seq workflow with FastQC and Bowtie2 |
| 15 | Wed 02/26/20 | [**Transcriptomics and the analysis of RNA-Seq data**](#15) <br> RNA-Seq aligners, Differential expression tests, RNA-Seq statistics, Counts and FPKMs and avoiding P-value misuse, Hands-on analysis of RNA-Seq data with R. <br> **N.B.** Find a gene assignment part 1 due today\! |
| 16 | Fri 02/28/20 | [**Genome annotation and the interpretation of gene lists**](#16) <br> Gene finding and functional annotation, Functional databases KEGG, InterPro, GO ontologies and functional enrichment |
| 17 | Wed 03/04/20 | [**Biological network analysis**](#17) <br> Network based approaches for integrating and interpreting large heterogeneous high throughput data sets; Discovering relationships in ‘omics’ data; Network construction, manipulation, visualization and analysis; Major graph theory and network topology measures and concepts (Degree, Communities, Shortest Paths, Centralities, Betweenness, Random graphs vs scale free); Hands-on with Cytoscape and igraph packages. |
| 18 | Fri 03/06/20 | [**Cancer genomics**](#18) <br> Cancer genomics resources and bioinformatics tools for investigating the molecular basis of cancer. Mining the NCI Genomic Data Commons; Immunoinformatics and immunotherapy; Using genomics and bioinformatics to design a personalized cancer vaccine. Implications for personalized medicine. <br> **N.B.** Find a gene assignment due before next class\! |
| 19 | Wed 03/11/20 | [**Course summary**](#19) <br> Summary of learning goals, Student course evaluation time and exam preparation; **Find a gene assignment due\!** |
| 20 | Fri 03/13/20 | [**Final exam\!**](#20) |
Class material
==============
<a name="1"></a>
1: Welcome to Bioinformatics
----------------------------
**Topics**:
Course introduction, Leaning goals & expectations, Biology is an information science, History of Bioinformatics, Types of data, Application areas and introduction to upcoming course segments, Student 30-second introductions, Introduction to NCBI & EBI resources for the molecular domain of bioinformatics, Hands-on session using NCBI-BLAST, Entrez, GENE, UniProt, Muscle and PDB bioinformatics tools and databases.
**Goals**:
- Understand the increasing necessity for computation in modern life sciences research.
- Get introduced to how bioinformatics is practiced.
- Understand course scope, expectations, logistics and [ethics code]({{ site.baseurl }}/ethics/).
- The goals of the hands-on session is to introduce a range of core bioinformatics databases and associated online services whilst actively investigating the molecular basis of several common human disease.
**Material**:
- Lecture Slides: [Large PDF]({{ site.baseurl }}/class-material/lecture-1-bggn213_large.pdf){:.no-push-state}{:target="_blank"}, [Small PDF]({{ site.baseurl }}/class-material/lecture-1-bggn213_small.pdf){:.no-push-state}{:target="_blank"}
- Lab: [Hands-on Worksheet]({{ site.baseurl }}/class-material/lab-1-bggn213.pdf){:.no-push-state}{:target="_blank"}
- Feedback: [Muddy-Point-Assessment](https://forms.gle/2YGfHU4y7JVyH4bt5){:.no-push-state}{:target="_blank"}
**Homework**:
- [Questions](https://forms.gle/83wFo4orukydNvDR7){:.no-push-state}{:target="_blank"}
- Complete the [pre-course survey](https://forms.gle/qeQL4BQNa71dCnLq7).
- Setup your [laptop computer]({{ site.baseurl }}/setup/) for this course.
- Get a copy of the course [syllabus]({{ site.baseurl }}/class-material/BGGN213_F19_syllabus.pdf){:.no-push-state},
- Complete the [Office Hours Sign Up Sheet](https://doodle.com/poll/72hqd8ir3tv9ya38){:.no-push-state}{:target="_blank"}.
**Readings**:
- PDF1: [What is bioinformatics? An introduction and overview]({{ site.baseurl }}/class-material/bioinformatics_review.pdf){:.no-push-state},
- PDF2: [Advancements and Challenges in Computational Biology]({{ site.baseurl }}/class-material/bioinformatics_challenges_2015.pdf){:.no-push-state}.
**Screen Casts**:
<br/>
<iframe width="560" height="315" src="https://www.youtube.com/embed/P2oSO7YPyfU?rel=0" frameborder="0" allowfullscreen></iframe>
**1 Welcome to BGGN-213:**
Course introduction and logistics.
{:.message}
<br/>
<iframe width="560" height="315" src="https://www.youtube.com/embed/gJNXQfpErLY?rel=0" frameborder="0" allowfullscreen></iframe>
**2 What is Bioinformatics?**
Bioinformatics can mean different things to different people. What will we actually learn in this class?
{:.message}
<br/>
<iframe width="560" height="315" src="https://www.youtube.com/embed/cCim7LrQZLY?rel=0" frameborder="0" allowfullscreen></iframe>
**3 How do we do Bioinformatics?**
Some basic bioinformatics can be done online or with downloaded tools. However, most often we will need a specialized computational setup.
{:.message}
------------------------------------------------------------------------
<a name="2"></a>
2: Sequence alignment fundamentals, algorithms and applications
---------------------------------------------------------------
**Topics**:
Further coverage of *major NCBI & EBI resources* for the molecular domain of bioinformatics with a focus on GenBank, UniProt, Entrez and Gene Ontology. There are many bioinformatics databases (see [handout]({{ site.baseurl }}/class-material/Major_Databases_bggn213.pdf){:.no-push-state}) and being able to judge their utility and quality is important. *Sequence Alignment and Database Searching*:
Homology, Sequence similarity, Local and global alignment, Heuristic approaches, Database searching with BLAST, E-values and evaluating alignment scores and statistics.
**Goals**:
- Be able to query, search, compare and contrast the data contained in major bioinformatics databases (GenBank, GENE, UniProt, PFAM, OMIM, PDB) and describe how these databases intersect.
- Be able to describe how nucleotide and protein sequence and structure data are represented (FASTA, FASTQ, GenBank, UniProt, PDB).
- Be able to describe how dynamic programming works for pairwise sequence alignment
- Appreciate the differences between global and local alignment along with their major application areas.
- Understand how aligning novel sequences with previously characterized genes or proteins provides important insights into their common attributes and evolutionary origins.
- The goals of the hands-on session are to explore the principles underlying the computational tools that can be used to compute and evaluate sequence alignments.
**Material**:
- Lecture Slides: [Large PDF]({{ site.baseurl }}/class-material/lecture-2-bggn213_large.pdf){:.no-push-state}{:target="_blank"}, [Small PDF]({{ site.baseurl }}/class-material/lecture-2-bggn213_small.pdf){:.no-push-state}{:target="_blank"}
- Lab: [Hands-on Worksheet]({{ site.baseurl }}/class-material/lab-2-bggn213.pdf){:.no-push-state}{:target="_blank"}
- Major Databases: [Handout PDF]({{ site.baseurl }}/class-material/Major_Databases.pdf){:.no-push-state}{:target="_blank"},
- Feedback: [Muddy-Point-Assessment](https://forms.gle/DBjDKg5azytyJrv86){:.no-push-state}{:target="_blank"}.
**Homework**:
- [Questions](https://forms.gle/faA93Hdq5G3qN9q78){:.no-push-state}{:target="_blank"},
- [Alignment Problem]({{ site.baseurl }}/class-material/lecture-2-bggn213_homework.pdf){:.no-push-state}{:target="_blank"},
**Readings**:
- Readings: PDF1: [What is dynamic programming?]({{ site.baseurl }}/class-material/Dynamic_programming_primer.pdf){:.no-push-state},
- Readings: PDF2 [Fundamentals of database searching]({{ site.baseurl }}/class-material/Fundamentals.pdf){:.no-push-state}.
------------------------------------------------------------------------
<a name="3"></a>
3: Advanced sequence alignment and database searching
-----------------------------------------------------
**Topics**: Detecting remote sequence similarity, Database searching beyond BLAST, Substitution matrices, Using PSI-BLAST, Profiles and HMMs, Protein structure comparisons. Beginning with command line based database searches.
**Goal**:
- Be able to calculate the alignment score between protein (or nucleotide) sequences using a provided scoring matrix such as BLOSUM62.
- Understand the limits of homology detection with tools such as BLAST.
- Know how to derive a PROSITE style regular expression for aligned motifs.
- Be able to calculate a PSSM profile and for aligned sequences and subsequently score new sequences using a PSSM.
- Be able to perform PSI-BLAST, HMMER and protein structure based database searches and interpret the results in terms of the biological significance of an e-value.
- Be familiar with the concepts of True Positives, False Positives, Sensitivity and Specificity.
**Material**:
- Lecture Slides: [Large PDF]({{ site.baseurl }}/class-material/lecture-3-bggn213_large.pdf){:.no-push-state}{:target="_blank"}, [Small PDF]({{ site.baseurl }}/class-material/lecture-3-bggn213_small.pdf){:.no-push-state}{:target="_blank"},
- Lab: [Hands-on Worksheet]({{ site.baseurl }}/class-material/lab-3-bggn213.pdf){:.no-push-state}{:target="_blank"},
- Bonus: [Alignment App](https://bioboot.github.io/bggn213_W20/class-material/nw/){:.no-push-state}{:target="_blank"},
- Feedback: [Muddy-Point-Assessment](https://forms.gle/NNwje57RUJy8AQ5g7){:.no-push-state}
**Homework**:
- [Homework](https://docs.google.com/document/d/1C4hBJCqbk_rO2ImioCHTsXJSHgMNhqaUB2WS0De4HEs/copy){:.no-push-state}{:target="_blank"} click and select "make a copy" then follow instructions,
- DataCamp Sign Up & Homework: [See your UCSD email invite!](https://www.datacamp.com){:.no-push-state},
- [RStudio and R download and setup]({{ site.baseurl }}/setup/).
------------------------------------------------------------------------
<a name="4"></a>
4: Bioinformatics data analysis with R
--------------------------------------
**Topics**: Why do we use R for bioinformatics? R language basics and the RStudio IDE, Major R data structures and functions, Using R interactively from the RStudio console.
**Goal**:
- Understand why we use R for bioinformatics
- Familiarity with R's basic syntax,
- Be able to use R to read and parse comma-separated (.csv) formatted files ready for subsequent analysis,
- Familiarity with major R data structures (vectors, matrices and data.frames),
- Understand the basics of using functions (arguments, vectorizion and re-cycling).
**Material**:
- Lecture Slides: [Large PDF]({{ site.baseurl }}/class-material/lecture-4-bggn213_large.pdf){:.no-push-state}{:target="_blank"}, [Small PDF]({{ site.baseurl }}/class-material/lecture-4-bggn213_small.pdf){:.no-push-state}{:target="_blank"}
- Lab: [Hands-on Worksheet]({{ site.baseurl }}/class-material/lab-4-bggn213/){:.no-push-state}{:target="_blank"}
- Feedback: [Muddy-Point-Assessment](https://goo.gl/forms/0ZILA8Y4yb30LL1q2){:.no-push-state}
**Homework**:
- Due today: [DataCamp, Into to R!](https://www.datacamp.com/){:.no-push-state}{:target="_blank"}.
- Due next week: [DataCamp, Intermediate R!](https://www.datacamp.com/){:.no-push-state}{:target="_blank"}.
------------------------------------------------------------------------
<a name="5"></a>
5: Data exploration and visualization in R
------------------------------------------
**Topics**: The exploratory data analysis mindset, Data visualization best practices, Simple base graphics (including scatterplots, histograms, bar graphs, dot chats, boxplots and heatmaps), Building more complex charts with ggplot.
**Goal**:
- Appreciate the major elements of exploratory data analysis and why it is important to visualize data.
- Be conversant with data visualization best practices and understand how good visualizations optimize for the human visual system.
- Be able to generate informative graphical displays including scatterplots, histograms, bar graphs, boxplots, dendrograms and heatmaps and thereby gain exposure to the extensive graphical capabilities of R.
- Appreciate that you can build even more complex charts with ggplot and additional R packages such as rgl.
**Material**:
- Lecture Slides: [Large PDF]({{ site.baseurl }}/class-material/lecture5-BGGN213-large.pdf){:.no-push-state}{:target="_blank"}, [Small PDF]({{ site.baseurl }}/class-material/lecture5-BGGN213-small.pdf){:.no-push-state}{:target="_blank"},
- Rmarkdown documents for [plot session 1]({{ site.baseurl }}/class-material/lecture-5-bggn213-draw_circle_points/){:.no-push-state}, and [more advanced plots]({{ site.baseurl }}/class-material/lecture-5-bggn213-plots/){:.no-push-state},
- Lab: [Main **hands-on Worksheet**]({{ site.baseurl }}/class-material/lab-5-bggn213.html){:.no-push-state}{:target="_blank"},
- Lab: [**Supplement 1**: Plotting with color in R]({{ site.baseurl }}/class-material/Rcolor.html){:.no-push-state}{:target="_blank"},
- Lab: [**Supplement 2**: A detailed guide to plotting with base R]({{ site.baseurl }}/class-material/lecture5-BGGN213_lab.pdf){:.no-push-state}{:target="_blank"},
- Example data for hands-on sections [lecture-5-bggn213-rstats.zip]({{ site.baseurl }}/class-material/lecture-5-bggn213-rstats.zip){:.no-push-state},
- SideNote: [Convincing with graphics](https://xkcd.com/833/){:.no-push-state}{:target="_blank"},
- Check-out the new website: [Data-to-Viz](https://www.data-to-viz.com/){:.no-push-state}{:target="_blank"},
- Feedback: [Muddy-Point-Assessment](https://goo.gl/forms/qIW4O4PUoixTzy7J2){:.no-push-state}{:target="_blank"}.
**Homework**:
- This units homework is all [via **DataCamp** (Intro to R, Intermideate R)](https://www.datacamp.com/){:.no-push-state}{:target="_blank"}.
------------------------------------------------------------------------
<a name="6"></a>
6: Why, when and how of writing your own R functions
----------------------------------------------------
**Topics**: , Using R scripts and Rmarkdown files, Import data in various formats both local and from online sources, The basics of writing your own functions that promote code robustness, reduce duplication and facilitate code re-use.
**Goals**:
- Be able to import data in various flat file formats from both local and online sources.
- Understand the structure and syntax of R functions and how to view the code of any R function.
- Understand when you should be writing functions.
- Be able to follow a step by step process of going from a working code snippet to a more robust function.
**Material**:
- Lecture Slides: [Large PDF]({{ site.baseurl }}/class-material/lecture-6-bggn213_large.pdf){:.no-push-state}{:target="_blank"}, [Small PDF]({{ site.baseurl }}/class-material/lecture-6-bggn213_small.pdf){:.no-push-state}{:target="_blank"},
- Lab: [Hands-on Worksheet]({{ site.baseurl }}/class-material/lab-6-bggn213.pdf){:.no-push-state}{:target="_blank"},
- Flat files for importing with read.table: [test1.txt]({{ site.baseurl }}/class-material/test1.txt){:.no-push-state}, [test2.txt]({{ site.baseurl }}/class-material/test2.txt){:.no-push-state}, [test3.txt]({{ site.baseurl }}/class-material/test3.txt){:.no-push-state}.
- Feedback: [Muddy-Point-Assessment](https://goo.gl/forms/GrFc3oDfAwCCj2BA2){:.no-push-state}{:target="_blank"}.
**Homework**:
- See **Q6** of the [hands-on lab sheet above]({{ site.baseurl }}/class-material/lab-6-bggn213.pdf){:.no-push-state}{:target="_blank"}. This entails turning a supplied code snippet into a more robust and re-usable function that will take any of the three listed input proteins and plot the effect of drug binding. Note assessment rubric and submission instructions within document. (Submission deadline: 1pm **next week!**).
- The remainder of this units homework is all [via **DataCamp**](https://www.datacamp.com/){:.no-push-state}.
------------------------------------------------------------------------
<a name="7"></a>
7: Epigenetics and 3D genome organization
-------------------------------------------------------
**Topics**: Guest lecture from [Ferhat Ay (LJI)](https://www.lji.org/faculty-research/labs/ay/#overview){:.no-push-state}{:target="_blank"} introducing epigenetics research and supporting bioinformatics methods and tools.
**Material**:
- Lecture Slides: [Large PDF]({{ site.baseurl }}/class-material/LectureSlides-01-29-2020-FerhatAy.pdf){:.no-push-state}{:target="_blank"}
- Supporting files: [simulateCpGmethylation.r]({{ site.baseurl }}/class-material/simulateCpGmethylation.r){:.no-push-state}, [ImmuneCell-ChIPseq-PCHiC.json]({{ site.baseurl }}/class-material/ImmuneCell-ChIPseq-PCHiC.json){:.no-push-state}.
------------------------------------------------------------------------
<a name="8"></a>
8: Bioinformatics R packages from CRAN and BioConductor
-------------------------------------------------------
**Topics**: More on how to write R functions with worked examples. Further extending functionality and utility with R packages, Obtaining R packages from CRAN and Bioconductor, Working with Bio3D for molecular data, Managing genome-scale data with bioconductor.
**Goals**:
- Be able to find and install R packages from CRAN and bioconductor,
- Understand how to find and use package vignettes, demos, documentation, tutorials and source code repository where available.
- Be able to write and (re)use basic R scripts to aid with reproducibility.
**Material**:
- Lecture Slides: [Large PDF]({{ site.baseurl }}/class-material/lecture-7-bggn213_large.pdf){:.no-push-state}{:target="_blank"}, [Small PDF]({{ site.baseurl }}/class-material/lecture-7-bggn213_small.pdf){:.no-push-state}{:target="_blank"},
- Example input for **grade()** function: [student_homework.csv]({{ site.baseurl }}/class-material/student_homework.csv){:.no-push-state},
- [Collaborative Google Doc based notes on selected R packages](https://docs.google.com/document/d/1r5IZYqHOHHWzbucaenbFv5DiES2UI_zfzFRUbdBe2Xw/edit?usp=sharing){:.no-push-state}{:target="_blank"},
- Feedback: [Muddy-Point-Assessment](https://goo.gl/forms/LHI8L0QYVXChcNw02){:.no-push-state}.
**Homework**:
- See **Q6** of the [hands-on lab sheet from the previous class]({{ site.baseurl }}/class-material/lab-6-bggn213.pdf){:.no-push-state}. This entails turning a supplied code snippet into a more robust and re-usable function that will take any of the three listed input proteins and plot the effect of drug binding. Note assessment rubric and submission instructions within document. (Submission deadline: 1pm **next class!**).
- DataCamp [homework](https://www.datacamp.com/){:.no-push-state}.
------------------------------------------------------------------------
<a name="9"></a>
9: Introduction to machine learning for Bioinformatics (Part 1)
--------------------------------------------------------
**Topics**: Unsupervised learning, supervised learning and reinforcement learning; Focus on unsupervised learning, K-means clustering, Hierarchical clustering, Heatmap representations. Dimensionality reduction, visualization and analysis, Principal Component Analysis (PCA)
Practical considerations and best practices for the analysis of high dimensional datasets.
**Goal**:
- Understand the major differences between unsupervised and supervised learning.
- Be able to create k-means and hierarchical cluster models in R
- Be able to describe how the k-means and bottom-up hierarchical cluster algorithms work.
- Know how to visualize and integrate clustering results and select good cluster models.
- Be able to describe in general terms how PCA works and its major objectives.
- Be able to apply PCA to high dimensional datasets and visualize and integrate PCA results (e.g identify outliers, find structure in features and aid in complex dataset visualization).
**Material**:
- Lecture Slides: [Large PDF]({{ site.baseurl }}/class-material/lecture-8-bggn213_large.pdf){:.no-push-state}{:target="_blank"}, [Small PDF]({{ site.baseurl }}/class-material/lecture-8-bggn213_small.pdf){:.no-push-state}{:target="_blank"},
- WebApp: [Introduction to PCA]({{ site.baseurl }}/class-material/pca/){:.no-push-state}{:target="_blank"},
- Lab: [Hands-on section worksheet for PCA]({{ site.baseurl }}/class-material/lab-8-bggn213.html){:.no-push-state}{:target="_blank"},
- Data files: [UK_foods.csv]({{ site.baseurl }}/class-material/UK_foods.csv){:.no-push-state}, [expression.csv]({{ site.baseurl }}/class-material/expression.csv){:.no-push-state}.
- Feedback: [Muddy point assessment](https://forms.gle/rRdkKbaGR7gcywK76){:.no-push-state}.
------------------------------------------------------------------------
<a name="10"></a>
10: Unsupervised learning mini-project
-------------------------------------
**Topics**: Longer hands-on session with unsupervised learning analysis of cancer cells, Practical considerations and best practices for the analysis and visualization of high dimensional datasets.
**Goals**:
- Be able to import data and prepare for unsupervised learning analysis.
- Be able to apply and test combinations of PCA, k-means and hierarchical clustering to high dimensional datasets and critically review results.
**Material**:
- Lecture Slides: [Large PDF]({{ site.baseurl }}/class-material/lecture-9-bggn213_large.pdf){:.no-push-state}{:target="_blank"}, [Small PDF]({{ site.baseurl }}/class-material/lecture-9-bggn213_small.pdf){:.no-push-state}{:target="_blank"},
- Lab: [Hands-on Worksheet]({{ site.baseurl }}/class-material/lab-9-bggn213-WEBX.html){:.no-push-state}{:target="_blank"}
- Data file: [WisconsinCancer.csv]({{ site.baseurl }}/class-material/WisconsinCancer.csv){:.no-push-state}, [new_samples.csv]({{ site.baseurl }}/class-material/new_samples.csv){:.no-push-state}.
- Bio3D PCA App: [http://bio3d.ucsd.edu/pca-app/](http://bio3d.ucsd.edu/pca-app/){:.no-push-state}{:target="_blank"}.
- Feedback: [Muddy-Point-Assessment](https://goo.gl/forms/vHYEbuAmV2uMZEom2){:.no-push-state}
**Reading**:
- Bonus: [StackExchange discussion on PCA](https://stats.stackexchange.com/questions/2691/making-sense-of-principal-component-analysis-eigenvectors-eigenvalues?utm_medium=organic&utm_source=google_rich_qa&utm_campaign=google_rich_qa){:.no-push-state}.
- Book: [Statistics for Modern Biology](http://web.stanford.edu/class/bios221/book/index.html)
- 2019 Genome Biology review article [Machine learning and complex biological data](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1689-0){:.no-push-state}{:target="_blank"}.
- 2019 Pre-print, [Accuracy, Robustness and Scalability of Dimensionality Reduction Methods for Single Cell RNAseq Analysis](https://www.biorxiv.org/content/10.1101/641142v2.full){:.no-push-state}{:target="_blank"}.
------------------------------------------------------------------------
<a name="11"></a>
11: **Project:** Find a gene assignment (Part 1)
------------------------------------------------
The [**find-a-gene project**]({{ site.baseurl }}/class-material/Find_A_Gene_Project.pdf){:.no-push-state}{:target="_blank"} is a required assignment for BIMM-143. The objective with this assignment is for you to demonstrate your grasp of database searching, sequence analysis, structure analysis and the R environment that we have covered to date in class.
You may wish to consult the scoring rubric at the end of the above linked project description and the [**example report**]({{ site.baseurl }}/class-material/Find_A_Gene_Project_Example.pdf){:.no-push-state}{:target="_blank"} for format and content guidance.
Your responses to questions Q1-Q4 are due at the beginning of class **Wed Feb 26th** (02/26/20).
The complete assignment, including responses to all questions, is due at the beginning of class **Wed March 11th** (03/11/20).
Late responses will not be accepted under any circumstances.
## Bonus: Hands-on with Git
Today’s lecture and hands-on sessions introduce Git, currently the most popular version control system. We will learn how to perform common operations with Git and RStudio. We will also cover the popular social code-hosting platforms GitHub and BitBucket.
- Lecture Slides: [Large PDF]({{ site.baseurl }}/class-material/lecture-10-bggn213_large.pdf){:.no-push-state}{:target="_blank"}, [Small PDF]({{ site.baseurl }}/class-material/lecture-10-bggn213_small.pdf){:.no-push-state}{:target="_blank"},
- Lab: [Hands-on with Git](http://tinyurl.com/rclass-github){:.no-push-state}{:target="_blank"},
- Jenny's *Namming Things* Slides: [Large PDF]({{ site.baseurl }}/class-material/lecture10-naming-slides.pdf){:.no-push-state}{:target="_blank"},
- Feedback: [Muddy-Point-Assessment](https://goo.gl/forms/gMxIBT5jLbjXHQPE2){:.no-push-state}
------------------------------------------------------------------------
<a name="12"></a>
12: Structural Bioinformatics (Part 1)
--------------------------------------
**Topics**: Protein structure function relationships, Protein structure and visualization resources, Modeling energy as a function of structure, Homology modeling, Predicting functional dynamics, Inferring protein function from structure.
**Goal**:
- View and interpret the structural models in the PDB,
- Understand the classic `Sequence>Structure>Function` via energetics and dynamics paradigm,
- Be able to use VMD for biomolecular visualization and analysis,
- Appreciate the role of bioinformatics in mapping the ENERGY LANDSCAPE of biomolecules,
- Be able to use the Bio3D package for exploratory analysis of protein sequence-structure-function-dynamics relationships.
**Material**:
- Lecture Slides: [Large PDF]({{ site.baseurl }}/class-material/lecture-11-bggn213_large.pdf){:.no-push-state}{:target="_blank"}, [Small PDF]({{ site.baseurl }}/class-material/lecture-11-bggn213_small.pdf){:.no-push-state}{:target="_blank"},
- Lab: [Hands-on Worksheet]({{ site.baseurl }}/class-material/pdb_pca_labclass.html){:.no-push-state}{:target="_blank"},
- Software link: [VMD download](http://www.ks.uiuc.edu/Development/Download/download.cgi){:.no-push-state}{:target="_blank"},
- Guide to [VMD see section 2]({{ site.baseurl }}/class-material/lab-11-bggn213.pdf){:.no-push-state}{:target="_blank"},
- Feedback: [Muddy-Point-Assessment](https://forms.gle/epVKGejGRectHEdp8){:.no-push-state}.
------------------------------------------------------------------------
<a name="13"></a>
13: Bioinformatics in drug discovery and design
-----------------------------------------------
**Topics**: Bioinformatics approaches for drug discovery, Target & lead identification, Receptor/target-based approaches, Small molecule docking methods, Protein motion and conformational variants and functional dynamics; Molecular simulation and drug optimization.
**Goals**:
- Appreciate how bioinformatics can predict functional dynamics & further aid drug discovery,
- Be able to apply open-source *In silico* docking and virtual screening strategies for drug discovery,
- Appreciate how bioinformatics can predict the functional dynamics of biomolecules,
- Be able to use Bio3D for the analysis and prediction of protein flexibility,
- Understand the increasing role of bioinformatics in pharma and the drug discovery process in particular.
**Material**:
- Lecture Slides: [Large PDF]({{ site.baseurl }}/class-material/lecture-12-bggn213_large.pdf){:.no-push-state}{:target="_blank"}, [Small PDF]({{ site.baseurl }}/class-material/lecture-12-bggn213_small.pdf){:.no-push-state}{:target="_blank"},
- Lab: [Hands-on Worksheet]({{ site.baseurl }}/class-material/lab-12-bggn213.pdf){:.no-push-state}{:target="_blank"},
- Software download links: [AutoDock Tools](http://mgltools.scripps.edu/downloads){:.no-push-state}{:target="_blank"}, [AutoDock Vina](http://vina.scripps.edu/download.html){:.no-push-state}{:target="_blank"},
- Optional: [PyMol](https://pymol.org/2/){:.no-push-state}{:target="_blank"}, [License]({{ site.baseurl }}/class-material/license.lic){:.no-push-state}{:target="_blank"}.
- For **Mac only** [Xquartz](https://www.xquartz.org){:.no-push-state}{:target="_blank"},
- Optional backup files: [config.txt]({{ site.baseurl }}/class-material/config.txt){:.no-push-state}, [1hsg_protein.pdbqt]({{ site.baseurl }}/class-material/1hsg_protein.pdbqt){:.no-push-state}, [ligand.pdbqt]({{ site.baseurl }}/class-material/ligand.pdbqt){:.no-push-state}, [log.txt]({{ site.baseurl }}/class-material/log.txt){:.no-push-state}, [all.pdbqt]({{ site.baseurl }}/class-material/all.pdbqt){:.no-push-state}
- Feedback: [Muddy-Point-Assessment](https://goo.gl/forms/nHmtEwJB7xaEZHua2){:.no-push-state}
------------------------------------------------------------------------
<a name="14"></a>
14: Genome informatics and high throughput sequencing (Part 1)
--------------------------------------------------------------
**Topics**: Genome sequencing technologies past, present and future (Sanger, Shotgun, PacBio, Illumina, toward the $500 human genome), Biological applications of sequencing, Variation in the genome, RNA-Sequencing for gene expression analysis; Major genomic databases, tools and visualization resources from the EBI & UCSC, The Galaxy platform for quality control and analysis; Sample Galaxy RNA-Seq workflow with FastQC and Bowtie2
**Goals**:
- Appreciate and describe in general terms the rapid advances in sequencing technologies and the new areas of investigation that these advances have made accessible.
- Understand the process by which genomes are currently sequenced and the bioinformatics processing and analysis required for their interpretation.
- For a genomic region of interest (e.g. the neighborhood of a particular SNP), use a genome browser to view nearby genes, transcription factor binding regions, epigenetic information, etc.
- Be able to use the Galaxy platform for basic RNA-Seq analysis from raw reads to expression value determination.
- Understand the FASTQ file format and the information it holds.
- Understand the [SAM/BAM file format]({{ site.baseurl }}//class-material/sam_format/){:.no-push-state} and the information it holds.
**Material**:
- Lecture Slides: [Large PDF]({{ site.baseurl }}/class-material/lecture-13-bggn213_large.pdf){:.no-push-state}{:target="_blank"}, [Small PDF]({{ site.baseurl }}/class-material/lecture-13-bggn213_small.pdf){:.no-push-state}{:target="_blank"},
- Lab: [Hands-on Worksheet]({{ site.baseurl }}/class-material/lab-13-bggn213.pdf){:.no-push-state}{:target="_blank"},
- RNA-Seq data files: [HG00109_1.fastq]({{ site.baseurl }}/class-material/HG00109_1.fastq){:.no-push-state}, [HG00109_2.fastq]({{ site.baseurl }}/class-material/HG00109_2.fastq){:.no-push-state}, [genes.chr17.gtf]({{ site.baseurl }}/class-material/genes.chr17.gtf){:.no-push-state}, [Expression genotype results]({{ site.baseurl }}/class-material/rs8067378_ENSG00000172057.6.txt){:.no-push-state}, [Example R script]({{ site.baseurl }}/class-material/lecture13_plot.r){:.no-push-state}{:target="_blank"}, [Example Rmd](https://github.com/bioboot/test_github/blob/master/lecture13_plot.md){:.no-push-state}{:target="_blank"}.
- [SAM/BAM file format description]({{ site.baseurl }}//class-material/sam_format/){:.no-push-state}{:target="_blank"}.
- Feedback: [Muddy-Point-Assessment](https://goo.gl/forms/uokTiQ3YStajFVIl1){:.no-push-state}
## IPs
- nt1 IP: http://107.23.47.165/galaxy
- nt2 IP: http://18.206.19.101/galaxy
- nt3 IP: http://18.210.213.47/galaxy
- nt4 IP: http://18.213.25.125/galaxy
- nt5 IP: http://18.233.39.50/galaxy
- nt6 IP: http://3.208.130.150/galaxy
- nt7 IP: http://3.210.4.176/galaxy
- nt8 IP: http://3.211.41.239/galaxy
- nt9 IP: http://3.212.58.204/galaxy
- nt10 IP: http://3.212.78.120/galaxy
- nt11 IP: http://3.215.105.211/galaxy
- nt12 IP: http://3.221.13.134/galaxy
- nt13 IP: http://3.221.138.207/galaxy
- nt14 IP: http://3.221.25.194/galaxy
- nt15 IP: http://3.231.195.172/galaxy
- nt16 IP: http://3.233.175.78/galaxy
- nt17 IP: http://3.233.51.90/galaxy
- nt18 IP: http://3.89.151.246/galaxy
- nt19 IP: http://3.94.155.106/galaxy
- nt20 IP: http://34.192.179.40/galaxy
- nt21 IP: http://34.200.165.54/galaxy
- nt22 IP: http://34.204.165.71/galaxy
- nt23 IP: http://34.205.38.109/galaxy
- nt24 IP: http://34.225.10.96/galaxy
- nt25 IP: http://34.226.129.36/galaxy
- nt26 IP: http://34.226.166.168/galaxy
- nt27 IP: http://34.232.156.170/galaxy
- nt28 IP: http://34.236.90.80/galaxy
- nt29 IP: http://34.239.11.51/galaxy
- nt30 IP: http://50.17.174.193/galaxy
- **HOLD** ntbarry IP: http://52.20.155.201/galaxy (BG)
------------------------------------------------------------------------
<a name="15"></a>
15: Transcriptomics and the analysis of RNA-Seq data
----------------------------------------------------
**Topics**:
Analysis of RNA-Seq data with R, Differential expression tests, RNA-Seq statistics, Counts and FPKMs, Normalizing for sequencing depth and gene length, Hands-on analysis of RNA-Seq data with R, DESeq2 analysis. **N.B.** Find a gene assignment part 1 due today!
**Goals**:
- Given an RNA-Seq dataset, find the set of significantly differentially expressed genes and their annotations.
- Gain competency with data import, processing and analysis with DESeq2 and other bioconductor packages.
- Understand the structure of count data and metadata required for running analysis.
- Be able to extract, explore, visualize and export results.
**Material**:
- Lecture Slides: [Large PDF]({{ site.baseurl }}/class-material/lecture-14-bggn213_large.pdf){:.no-push-state}{:target="_blank"}, [Small PDF]({{ site.baseurl }}/class-material/lecture-14-bggn213_small.pdf){:.no-push-state}{:target="_blank"},
- Detailed [Bioconductor setup]({{ site.baseurl }}//class-material/bioconductor_setup/){:.no-push-state}{:target="_blank"} instructions.
- Lab: [Hands-on Worksheet]({{ site.baseurl }}/class-material/lab-14-bggn213.html){:.no-push-state}{:target="_blank"},
- Data files: [airway_scaledcounts.csv]({{ site.baseurl }}/class-material/airway_scaledcounts.csv){:.no-push-state}, [airway_metadata.csv]({{ site.baseurl }}/class-material/airway_metadata.csv){:.no-push-state}, [annotables_grch38.csv]({{ site.baseurl }}/class-material/annotables_grch38.csv){:.no-push-state}.
- Feedback: **To Update** [Muddy-Point-Assessment](){:.no-push-state}
**Readings**:
- Excellent review article: [Conesa et al. A survey of best practices for RNA-seq data analysis. _Genome Biology_ 17:13 (2016)](http://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-0881-8){:.no-push-state}.
- An oldey but a goodie: [Soneson et al. "Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences." _F1000Research_ 4 (2015)](https://f1000research.com/articles/4-1521/v2).
- Abstract and introduction sections of: [Himes et al. "RNA-Seq transcriptome profiling identifies CRISPLD2 as a glucocorticoid responsive gene that modulates cytokine function in airway smooth muscle cells." _PLoS ONE_ 9.6 (2014): e99625](http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0099625){:.no-push-state}.
------------------------------------------------------------------------
<a name="16"></a>
16: Genome annotation and the interpretation of gene lists
----------------------------------------------------------
**Topics**: Gene finding and functional annotation from high throughput sequencing data, Functional databases KEGG, InterPro, GO ontologies and functional enrichment
**Goals**: Perform a GO analysis to identify the pathways relevant to a set of genes (e.g. identified by transcriptomic study or a proteomic experiment). Use both Bioconductor packages and online tools to interpret gene lists and annotate potential gene functions.
**Material**:
- Lecture Slides: [Large PDF]({{ site.baseurl }}/class-material/lecture-15-bggn213_large.pdf){:.no-push-state}{:target="_blank"}, [Small PDF]({{ site.baseurl }}/class-material/lecture-15-bggn213_small.pdf){:.no-push-state}{:target="_blank"},
- Lab: [Hands-on Worksheet]({{ site.baseurl }}/class-material/lab-15-bggn213.html){:.no-push-state}{:target="_blank"},
- Data files: [GSE37704_featurecounts.csv]({{ site.baseurl }}/class-material/GSE37704_featurecounts.csv){:.no-push-state}, [GSE37704_metadata.csv]({{ site.baseurl }}/class-material/GSE37704_metadata.csv){:.no-push-state}.
- Feedback: [Muddy-Point-Assessment](){:.no-push-state}
**R Knowledge Check**:
[**Quiz Assessment**](https://forms.gle/1fYii46METGD4y3S7){:.no-push-state}{:target="_blank"},
**Readings**:
- Good review article: Trapnell C, Hendrickson DG, Sauvageau M, Goff L et al. "*Differential analysis of gene regulation at transcript resolution with RNA-seq*". Nat Biotechnol 2013 Jan;31(1):46-53. [PMID: 23222703](https://www.ncbi.nlm.nih.gov/pubmed/23222703){:.no-push-state}.
------------------------------------------------------------------------
<a name="17"></a>
17: Biological network analysis
-------------------------------
**Topics**: Network graph approaches for integrating and interpreting large heterogeneous high throughput data sets; Discovering relationships in 'omics' data; Network construction, manipulation, visualization and analysis; Graph theory; Major network topology measures and concepts (Degree, Communities, Shortest Paths, Centralities, Betweenness, Random graphs vs scale free); Hands-on with Cytoscape and igraph R packages for network visualization and analysis.
**Goals**:
- Be able to describe the major goals of biological network analysis and the concepts underlying network visualization and analysis.
- Be able to use Cytoscape for network visualization and manipulation.
- Be able to find and instal Cytoscape Apps to extend network analysis functionality.
- Appreciate that the igraph R package has extensive network analysis functionality beyond that in Cytoscape and that the R bioconductor package RCy3 package allows us to bring networks and associated data from R to Cytoscape so we can have the best of both worlds.
**Material**:
- Software Download: [Cytoscape](https://cytoscape.org/download.html){:.no-push-state}{:target="_blank"},
- Lecture Slides: [Large PDF]({{ site.baseurl }}/class-material/lecture-16-bggn213_large.pdf){:.no-push-state}{:target="_blank"},, [Small PDF]({{ site.baseurl }}/class-material/lecture-16-bggn213_small.pdf){:.no-push-state}{:target="_blank"},
- Lab: [Hands-on section worksheet Part 1 (**Networks Visualization**).]({{ site.baseurl }}/class-material/lecture16_bggn213_lab1.pdf){:.no-push-state}{:target="_blank"},
- Lab: [Hands-on section worksheet Part 2 (**Networks Analysis**).]({{ site.baseurl }}/class-material/lecture16_bggn213_lab2.html){:.no-push-state}{:target="_blank"},
- Data files:
- [galFiltered.sif]({{ site.baseurl }}/class-material/galFiltered.sif){:.no-push-state},
- [galExpData.csv]({{ site.baseurl }}/class-material/galExpData.csv){:.no-push-state},
- [CytoscapeDemo_01.cys]({{ site.baseurl }}/class-material/CytoscapeDemo_01.cys){:.no-push-state},
- [virus_prok_cor_abundant.tsv]({{ site.baseurl }}/class-material/virus_prok_cor_abundant.tsv){:.no-push-state},
- [phage_ids_with_affiliation.tsv]({{ site.baseurl }}/class-material/phage_ids_with_affiliation.tsv){:.no-push-state},
- [prok_tax_from_silva.tsv]({{ site.baseurl }}/class-material/prok_tax_from_silva.tsv){:.no-push-state}.
- Feedback: [Muddy-Point-Assessment](){:.no-push-state}
------------------------------------------------------------------------
<a name="0"></a>
Bonus: Essential UNIX for bioinformatics
-------------------------------------------
**Topics**: Bioinformatics on the command line, Why do we use UNIX for bioinformatics? UNIX philosophy, 21 Key commands, Understanding processes, File system structure, Connecting to remote servers, Redirection, streams and pipes, Workflows for batch processing, Organizing computational projects.
**Goal:**
- Understand why we use UNIX for bioinformatics
- Use UNIX command-line tools for file system navigation and text file manipulation.
- Have a familiarity with 21 key UNIX commands that we will use ~90% of the time.
- Be able to connect to remote servers from the command line.
- Use existing programs at the UNIX command line to analyze bioinformatics data.
- Understand IO Redirection, Streams and pipes.
- Understand best practices for organizing computational projects.
**Material**:
- Lecture Slides: [Large PDF]({{ site.baseurl }}/class-material/lecture17_bggn213-large.pdf){:.no-push-state}{:target="_blank"}, [Small PDF]({{ site.baseurl }}/class-material/lecture17_bggn213-small.pdf){:.no-push-state}{:target="_blank"},
- Hands-on section worksheet
* [Using remote UNIX machines (Part I, **REQUIRED**)]({{ site.baseurl }}/class-material/17_blast-01/){:.no-push-state}{:target="_blank"},
* [Using remote UNIX machines (Part II, Optional)]({{ site.baseurl }}/class-material/16_blast-02/){:.no-push-state},
* [Using remote UNIX machines (Part III, Optional)]({{ site.baseurl }}/class-material/16_blast-03/){:.no-push-state}.
- Example data set [bggn213_01_unix.zip]({{ site.baseurl }}/class-material/bggn213_01_unix.zip){:.no-push-state},
- [Muddy point assessment](https://goo.gl/forms/W2G06LVrn2pADB2q1){:.no-push-state}.
## IPs
- (01) **54.68.182.225** (HOLD)
- (02) **54.189.169.250**
- (03) **35.163.70.54**
- (04) **52.43.128.251**
- (05) **34.211.228.140**
- (06) **34.215.96.32**
- (07) **HOLD**
- (08) **34.209.87.222**
- (09) **54.149.119.57**
- (10) **54.188.13.147**
- (11) **18.237.84.45**
- (12) **52.41.50.57**
- (13) **18.236.143.14**
- (14) **35.164.171.59**
- (15) **52.25.18.248**
- (16) **52.10.25.88**
- (17) **52.27.151.124**
- (18) **34.222.134.9**
- (19) **34.221.179.45**
- (20) **18.237.252.10**
- (21) **54.202.63.58**
- (22) **34.221.163.124**
- (23-31) **HOLD-**
- (32) **34.211.23.184**
- (33) **52.27.103.10**
- (34) **34.211.99.242**
- (35) **54.245.222.241**
- (36) **54.149.122.32**
- (37) **54.69.247.45**
- (38) **34.221.177.237**
- (39) **34.222.145.209**
- (40) **18.236.154.101**
- (41) **34.215.234.71**
- (42) **18.237.105.164**
- (43) **54.218.159.21**
- (44) **34.221.162.36**
- (45) **34.217.58.181**
- (46) **54.244.198.91**
- (47) **54.190.60.70**
- (48) **52.43.9.18**
- (49) **34.221.95.56**
- (50) **52.38.53.33** (HOLD)
------------------------------------------------------------------------
<a name="18"></a>
18: Cancer genomics
-------------------
**Topics**: Cancer genomics resources and bioinformatics tools for investigating the molecular basis of cancer. Large scale cancer sequencing projects; NCI Genomic Data Commons; What has been learned from genome sequencing of cancer? **Immunoinformatics, immunotherapy and cancer**; Guest lecture from Dr. Bjoern Peters (LAI): Using genomics and bioinformatics to harness a patient’s own immune system to fight cancer. Implications for the development of personalized medicine.
**N.B.** Find a gene assignment due before next class!
**Material**:
- Lecture Slides: [Large PDF]({{ site.baseurl }}/class-material/lecture-18-bggn213_large.pdf){:.no-push-state}{:target="_blank"}, [Small PDF]({{ site.baseurl }}/class-material/lecture18_bggn213-small.pdf){:.no-push-state}{:target="_blank"},
- Lab: [Hands-on Worksheet Part 1.]({{ site.baseurl }}/class-material/lecture18_part1_BGGN213_W19.html){:.no-push-state}{:target="_blank"},
- Lab: [Hands-on Worksheet Part 2.]({{ site.baseurl }}/class-material/lecture18_part2_BGGN213_W19/){:.no-push-state}{:target="_blank"},
- Data files:
- [lecture18_sequences.fa]({{ site.baseurl }}/class-material/lecture18_sequences.fa){:.no-push-state},
- Solutions:
- Example [mutant identification and subsequence extraction with R]({{ site.baseurl }}/class-material/lecture18_part2_example/){:.no-push-state} walk through.
- [subsequences.fa]({{ site.baseurl }}/class-material/subsequences.fa){:.no-push-state},
- [Solutions.pdf]({{ site.baseurl }}/class-material/Solutions.pdf){:.no-push-state}.
- IEDB HLA binding prediction website [http://tools.iedb.org/mhci/](http://tools.iedb.org/mhci/){:.no-push-state}{:target="_blank"}.
- [GitHub Rmd](https://github.com/bioboot/bggn213_classwork_S19/blob/master/class18/class18.md){:.no-push-state}{:target="_blank"}.
- Feedback: [Muddy-Point-Assessment](https://forms.gle/VgGfkeXrByypzWkj8){:.no-push-state}
------------------------------------------------------------------------
<a name="19"></a>
19: Course summary
------------------
**Topics**: Summary of learning goals, Student course evaluation time and exam preparation; Find a gene assignment due. Open study.
**Material**:
- Lecture Slides: [Large PDF]({{ site.baseurl }}/class-material/lecture-19-bggn213_large.pdf){:.no-push-state}{:target="_blank"},
- Hand-out: [**Exam guidelines, topics, and example questions**]({{ site.baseurl }}/class-material/BGGN213_exam_guidlines.pdf){:.no-push-state}{:target="_blank"},
- Ether-pad: [**Feedback**](https://board.net/p/bggn213_w20){:.no-push-state}{:target="_blank"},
- DataCamp: [**Bioinformatics Extension Track**]({{ site.baseurl }}/class-material/datacamp_extras.pdf){:.no-push-state}{:target="_blank"}.
- Other Resources: [Advanced R book](http://adv-r.had.co.nz){:.no-push-state}{:target="_blank"}, [R for Data Science](https://r4ds.had.co.nz){:.no-push-state}{:target="_blank"};
------------------------------------------------------------------------
<a name="20"></a>
20: Final exam!
---------------
This open-book, open-notes 150-minute test consists of 35 questions. The number of points for each question is indicated in green font at the beginning of each question. There are 80 total points on offer.
Please remember to:
- Read all questions carefully before starting.
- Put your name, UCSD email and PID number on your test.
- Write all your answers on the space provided in the exam paper.
- Remember that concise answers are preferable to wordy ones.
- Clearly state any simplifying assumptions you make in solving a problem.
- No copies of this exam are to be removed from the class-room.
- No talking or communication (electronic to otherwise) with your fellow students once the exam has begun.
- **Good luck!**