- Notes:
- Tentative calendar (weekly topics), subject to changes depending on the pace of the course.
- Notes (:file_folder:) involves material discussed in class.
- Reading (:book:) involves material that expands lecture topics, as well as coding examples that you should practice on your own.
- Misc (:newspaper:) is supporting material that is worth taking a look at.
- 📇 Dates: Jan 17-19
- 📎 Topics: Introduction, course in a nutshell, and policies/logistics. Please spend some time outside class to review the course policies, piazza etiquette rules, as well as the FAQs.
- 📁 Notes:
- About the Course (slides)
- Introduction: Big Picture (slides)
- 📖 Reading:
- 🔬 Lab: No lab
- 📰 Misc:
- 🔈 To Do:
- 📇 Dates: Jan 22-26
- 📎 Topics: First things first, we begin with some basic survival skills for R, followed by an overall review of the RStudio workspace. Then we move on to discuss basic data types and their implementation in R around vectors. Likewise, we cover fundamental concepts like atomicity, vectorization, recycling, and subsetting.
- 📁 Notes:
- First contact with R (tutorial)
- Intro to Rmd files (tutorial)
- Data Types and Vectors (slides)
- 📖 Reading:
- www.markdowntutorial.com
- Markdown tutorial (by CommonMark)
- 🔬 Lab:
- 📰 Misc:
- Introduction to R Markdown (by RStudio)
- 💡 Cheat sheet:
- 🎯 WARM-UP 1:
- Markdown practice (due Feb-02)
- 📇 Dates: Jan 29-Feb 02
- 📎 Topics: Review of more data structures like arrays and lists. Discussion of the traditional base graphics approach that is based on R vectors.
- 📁 Notes:
- Arrays and Factors (slides)
- Lists (slides)
- Base Graphics I (slides)
- Base Graphics II (slides)
- 📖 Reading:
- Intro to vectors (tutorial)
- 🔬 Lab:
- 📰 Misc:
- chapter 20: Vectors (R for Data Science by Grolemund and Wickham)
- 💡 Cheat sheet:
- 🎯 WARM-UP 2:
- Vectors and Factors (due Feb-09)
- 📇 Dates: Feb 05-09
- 📎 Topics: Data Analysis Projects (DAPs) are made of files and directories. Therefore, we need to review some fundamental concepts such as the file-system, command line, and basics of version control systems.
- 📁 Notes:
- Filesystem Basics (slides)
- Shell Basics (slides)
- Working with files (slides)
- Git Basics (slides)
- 📖 Reading:
- The Unix Shell lessons 1-3 (by Software Carpentry)
- Linux Tutorial lessons 1-5 (by Ryan Chadwick)
- 🔬 Lab:
- 📰 Misc:
- Read sections 4 to 9 in Part I Installation (Happy Git and GitHub for the useR by Jenny Bryan et al.)
- 💡 Cheat sheet:
- 📇 Dates: Feb 12-16
- 📎 Topics: Tables are the most common form in which data is stored, handled, and manipulated. Consequently, we need to talk about the typical storage formats of tabular data, and the relationship between tables and R data frames. In addition, we cover Principal Components Analysis (PCA) which is an unsupervised learning technique for summarizing the systematic structure of a table consisting of quantitative variables.
- 📁 Notes:
- Data Tables (slides)
- Importing Tables in R (slides)
- Principal Component Analysis 1 (slides)
- Principal Component Analysis 2 (slides)
- 📖 Reading:
- Basic manipulation of Data Frames (slides)
- Organizing data in spreadsheets (by Karl Broman)
- 🔬 Lab:
- 📰 Misc:
- Data Import (R for Data Science by Grolemund and Wickham)
- 💡 Cheat sheet:
- 🎯 HW 1: due Feb-23
- TBA
- 📇 Dates: Feb 19-23 (Holiday Feb-19)
- 📎 Topics: We continue reviewing manipulation of data frames with the data plying framework provided by the package
"dplyr"
. Likewise, we review the visualization paradigm of"ggplot2"
which is based on data frames. In addition, we'll briefly introduce cluster analysis which involves the other major unsupervised learning flavor used to find groups in data. - 📁 Notes:
- Introduction to
"dplyr"
(tutorial) - Grammar of Graphics framework (slides)
- Cluster Analysis
- Introduction to
- 📖 Reading:
- "dplyr" tutorial (by Hadley Wickham)
- "ggplot2" lecture (by Karthik Ram)
- 🔬 Lab:
- 📰 Misc:
- Introduction to dplyr (by Hadley Wickham)
- 💡 Cheat sheet:
- 📇 Dates: Feb 26-Mar 02
- 📎 Topics: You don’t need to be an expert programmer to be a data scientist, but learning more about programming allows you to automate common tasks, and solve new problems with greater ease. We'll discuss how to write basic functions, the notion of R expressions, and an introduction to conditionals.
- 📁 Notes:
- Introduction to functions (tutorial)
- Introduction to R expressions and conditionals (tutorial)
- 🔬 Lab:
- 📰 Misc:
- chapter 19: Functions (R for Data Science by Grolemund and Wickham)
- 🎯 HW 2: due Mar-09
- TBA
- 📇 Dates: Mar 05-09
- 📎 Topics: In addition to writing functions to reduce duplication in your code, you also need to learn about iteration, which helps you when you need to do the same operation several times. Namely, we review control flow structures such as
for
loops,while
loops,repeat
loops, and theapply
family functions. - 📁 Notes:
- Introduction to loops (tutorial)
- More about functions (tutorial)
- 🔬 Lab:
- 📰 Misc:
- chapter 21: Iteration (R for Data Science by Grolemund and Wickham)
- 🎓 MIDTERM 1: Friday Mar-09
- 📇 Dates: Mar 12-16
- 📎 Topics: At its heart, computing involves working with numbers. However, a considerable amount of information and data is in the form of text. Therefore, you also need to learn about character strings, and how to perform basic manipulation of strings. In parallel, we'll keep working on writing funtions, especially focusing on testing functions.
- 📁 Notes:
- String Basics (slides)
- Intro to Strings (tutorial)
- Getting started with testing (by Wickham)
- 📖 Reading:
- Handling Strings in R (by Sanchez)
- 🔬 Lab:
- 📰 Misc:
- chapter 14: Strings (R for Data Science by Grolemund and Wickham)
- 🎯 HW 3: due Mar-23
- TBA
- 📇 Dates: Mar 19-23
- 📎 Topics: To unleash the power of strings manipulation, we need to take things to the next level and learn about Regular Expressions. Namely, Regular expressions are a tool that allows us to describe a certain amount of text called "patterns". We'll describe the basic concepts of regex and the common operations to match text patterns.
- 📁 Notes:
- Introduction to regular expressions
- Regexpal tester tool.
- 📖 Reading:
- Handling Strings in R (by Sanchez)
- 🔬 Lab:
- TBA
- 📰 Misc:
- Handling Strings in R (by Sanchez)
- 💡 Cheat sheet:
- 📇 Dates: Mar 26-30
- 🔋 (Re)charge your batteries!
- 🎯 HW 4: due Apr-06
- TBA
- 📇 Dates: Apr 02-06
- 📎 Topics: Random numbers have many applications in science and computer programming, especially when there are significant uncertainties in a phenomenon of interest. In this part of the course we'll look at some basic problems involving working with random numbers and creating simulations.
- 📁 Notes:
- TBA
- 📖 Reading:
- TBA
- 🔬 Lab:
- TBA
- 📇 Dates: Apr 09-13
- 📎 Topics: Shiny apps are a nice companion to R, making it quick and simple to deliver interactive analysis and graphics on any web browser. We'll review how to create simple shiny apps to display data summaries, queries, and interactive displays.
- 📁 Notes:
- shiny tutorial (by Grolemund)
- 📖 Reading:
- 🔬 Lab:
- TBA
- 📰 Misc:
- 🎯 HW 5: due Apr-20
- TBA
- 📇 Dates: Apr 16-20
- 📎 Topics: Packages are the fundamental units of reproducible R code. They include reusable functions, the documentation that describes how to use them, and sample data. In this part we'll start describing how to turn your code into an R package.
- 📁 Notes:
- TBA
- 📖 Reading:
- Package Structure (R packages by Wickham)
- See package components: http://r-pkgs.had.co.nz/ (R packages by Wickham)
- 🔬 Lab:
- TBA
- 📇 Dates: Apr 23-27
- 📎 Topics: Creating an R package can seem overwhelming at first. So we'll keep working on the creation of a relatively basic package. This will give you the opportunity to apply most of the concepts seen in the course.
- 📁 Notes:
- TBA
- 📖 Reading:
- See package components: http://r-pkgs.had.co.nz (R packages by Wickham)
- 🔬 Lab:
- TBA
- 🎯 HW 6: due Apr-27
- TBA
- 📇 Dates: Apr 30-May 04
- 📎 Topics: Prepare for final examination
- 📁 Notes:
- No lecture. Instructor will hold OH (in 309 Evans)
- 🎓 FINAL: Mon May 7, 8-11am (room TBA)