Merge pull request #10 from mglbrjs/main

removed base R plotting
gulfofmaine · Jun 28, 2022 · afeee18 · afeee18
2 parents 073b090 + ec5a597
commit afeee18
Show file tree

Hide file tree

Showing 4 changed files with 3,420 additions and 126 deletions.
diff --git a/Intro_to_R/Intro_to_R-Week1_Data.Rmd b/Intro_to_R/Intro_to_R-Week1_Data.Rmd
@@ -143,9 +143,17 @@ knitr::opts_chunk$set(echo = TRUE)
 
 # First, we will install 2 packages using code. You only need to do this step once. After tidyverse and DataExplorer are installed, you can delete this line of code or comment it out using # (will not be run)
 
+<<<<<<< HEAD
+#install.packages("tidyverse")
+#install.packages("DataExplorer")
+=======
 install.packages("tidyverse")
 install.packages("DataExplorer")
 ```
+<<<<<<< HEAD
+>>>>>>> 073b09069aef56b2e99fa5914301950686f7b36a
+=======
+>>>>>>> 073b09069aef56b2e99fa5914301950686f7b36a
 
 ```{r}
 # Second, you must load the installed packages into your library (working RStudio session) to be able to use it in the code. This must be done in each new R script or RMarkdown document you write.
@@ -174,8 +182,16 @@ getwd()
 
 Use `setwd()` to change your working directory to the location where your data is stored:
 
+<<<<<<< HEAD
+<<<<<<< HEAD
+```{r}
+#setwd("/Users/yourusername/folderwithdata")
+=======
+=======
+>>>>>>> 073b09069aef56b2e99fa5914301950686f7b36a
 ```{r, eval=FALSE}
 setwd("/Users/yourusername/folderwithdata")
+>>>>>>> 073b09069aef56b2e99fa5914301950686f7b36a
 ```
 
 ### Load data by copying and pasting (use `datapasta` package)
@@ -187,8 +203,8 @@ setwd("/Users/yourusername/folderwithdata")
 Pick `Paste as tribble`:
 
 ```{r}
-# install.packages("datapasta", repos = c(mm = "https://milesmcbain.r-universe.dev", getOption("repos"))) # Uncomment to install most up-to-date version of package
-# library("datapasta") # Uncomment if not already loaded
+ #install.packages("datapasta", repos = c(mm = "https://milesmcbain.r-universe.dev", getOption("repos"))) # Uncomment to install most up-to-date version of package
+ library("datapasta") # Uncomment if not already loaded
 
 tibble::tribble(
                                  ~Breed, ~Affectionate.With.Family, ~Good.With.Young.Children, ~Good.With.Other.Dogs, ~Shedding.Level, ~Coat.Grooming.Frequency, ~Drooling.Level, ~Coat.Type, ~Coat.Length, ~Openness.To.Strangers, ~Playfulness.Level, ~`Watchdog/Protective.Nature`, ~Adaptability.Level, ~Trainability.Level, ~Energy.Level, ~Barking.Level, ~Mental.Stimulation.Needs,
@@ -271,8 +287,18 @@ data.table::data.table(
 ### Load data from a CSV (.csv) file
 
 ```{r}
+<<<<<<< HEAD
+<<<<<<< HEAD
+ library(readr) # If the readr package is not loaded, uncomment and run this line of code
+penguinsCSV <- read_csv("penguins_data.csv")
+=======
+# library(readr) # If the readr package is not loaded, uncomment and run this line of code
+penguinsCSV <- read_csv(here::here("Intro_to_R", "Data/penguins_data.csv"))
+>>>>>>> 073b09069aef56b2e99fa5914301950686f7b36a
+=======
 # library(readr) # If the readr package is not loaded, uncomment and run this line of code
 penguinsCSV <- read_csv(here::here("Intro_to_R", "Data/penguins_data.csv"))
+>>>>>>> 073b09069aef56b2e99fa5914301950686f7b36a
 ```
 
 The text below the code gives you information about what happened when you ran the code. Sometimes you'll get an error or warning message, but in this case, the output is telling you what variable types it assigned to each column.
@@ -390,7 +416,7 @@ Here are a few functions to help you take a first look at your data quickly:
 
 ```{r, results='hide'}
 # install.package("DataExplorer")
-# library(DataExplorer) # Uncomment if package isn't loaded in R
+ library(DataExplorer) # Uncomment if package isn't loaded in R
 DataExplorer::create_report(penguins)
 ```
 

diff --git a/Intro_to_R/Intro_to_R-Week1_Data.html b/Intro_to_R/Intro_to_R-Week1_Data.html
diff --git a/Intro_to_R/Intro_to_R-Week2_Plots_and_Stats.Rmd b/Intro_to_R/Intro_to_R-Week2_Plots_and_Stats.Rmd
@@ -11,10 +11,8 @@ editor_options:
 ## Goals:
 
 1\. Describe
-<!--# This is probably review of data exploration from week 1 Add exploratory plots here, intro to ggplot here rather than later -->
 
 2\. Wrangle
-<!--# This is probably review of tidyverse from week 1, need to add penguins.csv data to GitHub so this code runs -->
 
 3\. Visualize variation
 
@@ -105,8 +103,6 @@ table(penguins$flipper_length_mm)
 
 ## Step 2: Wrangle
 
-<!--# Add tibble/tidy data here so we have something to work with for visualizing -->
-
 Let's get rid of any individuals with an NA in the sex or body_mass_g
 columns and save that to a new dataframe.
 
@@ -135,115 +131,14 @@ adelie_penguins<-penguins_complete %>%
 Let's look at the distribution of body mass for all species using the
 hist() function
 
-<!--# Use histogram to talk about base R plots -->
-
 `hist()` - This function computes a histogram of the given data values.
 The argument must be a numeric vector (a column of data is a vector).
 
-```{r}
-hist(penguins_complete$body_mass_g)
-```
-
 This and many other plots can be created using base R functions, but
 `ggplot()` provides a more consistent framework to generate and combine
 many different plots using tidy data as an input.
 
-<!--# Add histogram created with ggplot - pull code from later in this document -->
-
-By looking at these plot, can you guess what the mean might be? We can
-calculate it and check.
-
-```{r}
-mean(penguins_complete$body_mass_g)
-```
-
-However, we might want to investigate each species a little more
-specifically. Based on what little I know about penguins, I am think
-that one species is a quite a bit bigger (thus probably weighs more)
-than the others.
-
-<!--# Replace boxplot with ggplot -->
-
-**boxplot** is a function which allows you to produce box-and-whisker
-plots of the given (grouped) values
-
-The argument is a formula which specifies which grouping variable you
-want to divide a numeric vector by.
-
-```{r}
-boxplot(penguins_complete$body_mass_g ~ penguins_complete$species)
-```
-
-We can use group_by and summarise to calculate group means.
-
-```{r}
-penguins_complete %>% 
-  group_by(species) %>% 
-  summarize(mean_mass = mean(body_mass_g))
-
-```
-
-We could also guess that sexual dimorphism would cause variation is mass
-between the sexes (i.e., males are typically larger).
-
-Let's look at those differences for Adelie penguins only.
-
-```{r}
-boxplot(adelie_penguins$body_mass_g ~ adelie_penguins$sex)
-```
-
-We can use group_by and summarise to calculate group means here as well.
-
-```{r}
-adelie_penguins %>% 
-  group_by(sex) %>% 
-  summarize(mean_mass = mean(body_mass_g), sd=sd(body_mass_g))
-
-```
-
-## Step 4: Visualize covariation
-
-Now we may want to evaluate covariation between numeric variables, for
-example, we could examine the relationship between flipper length and
-body mass. What kind of relationship would you expect?
-
-<!--# Probably want to replace the following with ggplot -->
-
-**plot** is the generic base R plotting function that can be used to
-create scatterplots, line plots, and more
-
-The first argument is the x coordinates of the points in the plot
-
-The second argument is the y coordinates of points in the plot
-
-type = is an argument that tells R which type of plot should be drawn;
-common options are 'p' for points, 'l' for lines, or 'n' for no plotting
-(blank plotting area)
-
-```{r}
-plot(penguins_complete$flipper_length_mm, penguins_complete$body_mass_g, type = "p")
-```
-
-Let's addd some color by species and formatting
-
-```{r}
-plot(penguins_complete$flipper_length_mm, penguins_complete$body_mass_g, type = "n",xlab = "Flipper length (mm)", ylab = "Body mass (g)")
-
-points(penguins_complete$flipper_length_mm[penguins_complete$species == "Adelie"], penguins_complete$body_mass_g[penguins_complete$species == "Adelie"], col = "red")
-
-points(penguins_complete$flipper_length_mm[penguins_complete$species == "Chinstrap"], penguins_complete$body_mass_g[penguins_complete$species == "Chinstrap"], col = "green")
-
-points(penguins_complete$flipper_length_mm[penguins_complete$species == "Gentoo"], penguins_complete$body_mass_g[penguins_complete$species == "Gentoo"], col = "blue")
-
-legend(220, 3800, legend=c("Adelie", "Chinstrap", "Gentoo"),
-       col=c("red", "green", "blue"),  pch = 1,   bty = "n")
-
-```
-
-### ggplot
-
-Now, I want to show how you'd make the same plots using the ggplot
-syntax. For this, you'll need the ggplot2 package which is loaded as
+For this, you'll need the ggplot2 package which is loaded as
 part of the tidyverse.
 
 There are 4 aspects of ggplots you need to know about:
@@ -294,23 +189,71 @@ ggplot(data=penguins_complete, aes(x=body_mass_g)) + geom_histogram(bins=8)
 # delete or change the bins argument to see what happens
 ```
 
+By looking at these plot, can you guess what the mean might be? We can
+calculate it and check.
+
+```{r}
+mean(penguins_complete$body_mass_g)
+```
+
+However, we might want to investigate each species a little more
+specifically. Based on what little I know about penguins, I am think
+that one species is a quite a bit bigger (thus probably weighs more)
+than the others.
+
+```{r}
+
+ggplot(data=penguins_complete, aes(x=species, y=body_mass_g)) + geom_boxplot() + stat_boxplot(geom ='errorbar', width = 0.5)
+
+```
+
+We can use group_by and summarise to calculate group means.
+
+```{r}
+
+penguins_complete %>% 
+  group_by(species) %>% 
+  summarize(mean_mass = mean(body_mass_g))
+
+```
+
+We could also guess that sexual dimorphism would cause variation is mass
+between the sexes (i.e., males are typically larger).
+
+Let's look at those differences for Adelie penguins only.
+
 If you think about the way a boxplot is formatted, the categories are
 typically on the x axis and the numeric variables range on the y.
 
 ```{r}
-boxplot(adelie_penguins$body_mass_g ~ adelie_penguins$sex)
 
 ggplot(data=adelie_penguins, aes(x=sex, y=body_mass_g)) + geom_boxplot() + stat_boxplot(geom ='errorbar', width = 0.5)
+
+```
+
+We can use group_by and summarise to calculate group means here as well.
+
+```{r}
+adelie_penguins %>% 
+  group_by(sex) %>% 
+  summarize(mean_mass = mean(body_mass_g), sd=sd(body_mass_g))
+
 ```
 
+## Step 4: Visualize covariation
+
+Now we may want to evaluate covariation between numeric variables, for
+example, we could examine the relationship between flipper length and
+body mass. What kind of relationship would you expect?
+
 ggplot excels at multi-variable plots as compared to base graphics.
 
 Here's the basic scatterplot:
 
 ```{r}
-plot(penguins_complete$flipper_length_mm, penguins_complete$body_mass_g, type = "p")
 
 ggplot(data=penguins_complete, aes(x=flipper_length_mm, y=body_mass_g)) + geom_point()
+
 ```
 
 And the more advanced versions
@@ -319,23 +262,9 @@ Note: ggplot will automatically assign colors, but you can manually
 change them as well (will discuss later)
 
 ```{r}
-plot(penguins_complete$flipper_length_mm, penguins_complete$body_mass_g, type = "n",xlab = "Flipper length (mm)", ylab = "Body mass (g)")
-
-points(penguins_complete$flipper_length_mm[penguins_complete$species == "Adelie"], penguins_complete$body_mass_g[penguins_complete$species == "Adelie"], col = "red")
-
-points(penguins_complete$flipper_length_mm[penguins_complete$species == "Chinstrap"], penguins_complete$body_mass_g[penguins_complete$species == "Chinstrap"], col = "green")
-
-points(penguins_complete$flipper_length_mm[penguins_complete$species == "Gentoo"], penguins_complete$body_mass_g[penguins_complete$species == "Gentoo"], col = "blue")
-
-legend(220, 3800, legend=c("Adelie", "Chinstrap", "Gentoo"),
-       col=c("red", "green", "blue"),  pch = 1,   bty = "n")
-
-
-
-
-# ggplot version:
 
 ggplot(data = penguins_complete, aes(x=flipper_length_mm, y=body_mass_g,color=species)) + geom_point()
+
 ```
 
 `geom_line()` will create a line graph (connect each datapoint with a

diff --git a/Intro_to_R/report.html b/Intro_to_R/report.html