Skip to content

Commit

Permalink
Merge pull request #10 from mglbrjs/main
Browse files Browse the repository at this point in the history
removed base R plotting
  • Loading branch information
mglbrjs authored Jun 28, 2022
2 parents 073b090 + ec5a597 commit afeee18
Show file tree
Hide file tree
Showing 4 changed files with 3,420 additions and 126 deletions.
32 changes: 29 additions & 3 deletions Intro_to_R/Intro_to_R-Week1_Data.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -143,9 +143,17 @@ knitr::opts_chunk$set(echo = TRUE)
# First, we will install 2 packages using code. You only need to do this step once. After tidyverse and DataExplorer are installed, you can delete this line of code or comment it out using # (will not be run)
<<<<<<< HEAD
#install.packages("tidyverse")
#install.packages("DataExplorer")
=======
install.packages("tidyverse")
install.packages("DataExplorer")
```
<<<<<<< HEAD
>>>>>>> 073b09069aef56b2e99fa5914301950686f7b36a
=======
>>>>>>> 073b09069aef56b2e99fa5914301950686f7b36a
```{r}
# Second, you must load the installed packages into your library (working RStudio session) to be able to use it in the code. This must be done in each new R script or RMarkdown document you write.
Expand Down Expand Up @@ -174,8 +182,16 @@ getwd()

Use `setwd()` to change your working directory to the location where your data is stored:

<<<<<<< HEAD
<<<<<<< HEAD
```{r}
#setwd("/Users/yourusername/folderwithdata")
=======
=======
>>>>>>> 073b09069aef56b2e99fa5914301950686f7b36a
```{r, eval=FALSE}
setwd("/Users/yourusername/folderwithdata")
>>>>>>> 073b09069aef56b2e99fa5914301950686f7b36a
```

### Load data by copying and pasting (use `datapasta` package)
Expand All @@ -187,8 +203,8 @@ setwd("/Users/yourusername/folderwithdata")
Pick `Paste as tribble`:

```{r}
# install.packages("datapasta", repos = c(mm = "https://milesmcbain.r-universe.dev", getOption("repos"))) # Uncomment to install most up-to-date version of package
# library("datapasta") # Uncomment if not already loaded
#install.packages("datapasta", repos = c(mm = "https://milesmcbain.r-universe.dev", getOption("repos"))) # Uncomment to install most up-to-date version of package
library("datapasta") # Uncomment if not already loaded
tibble::tribble(
~Breed, ~Affectionate.With.Family, ~Good.With.Young.Children, ~Good.With.Other.Dogs, ~Shedding.Level, ~Coat.Grooming.Frequency, ~Drooling.Level, ~Coat.Type, ~Coat.Length, ~Openness.To.Strangers, ~Playfulness.Level, ~`Watchdog/Protective.Nature`, ~Adaptability.Level, ~Trainability.Level, ~Energy.Level, ~Barking.Level, ~Mental.Stimulation.Needs,
Expand Down Expand Up @@ -271,8 +287,18 @@ data.table::data.table(
### Load data from a CSV (.csv) file

```{r}
<<<<<<< HEAD
<<<<<<< HEAD
library(readr) # If the readr package is not loaded, uncomment and run this line of code
penguinsCSV <- read_csv("penguins_data.csv")
=======
# library(readr) # If the readr package is not loaded, uncomment and run this line of code
penguinsCSV <- read_csv(here::here("Intro_to_R", "Data/penguins_data.csv"))
>>>>>>> 073b09069aef56b2e99fa5914301950686f7b36a
=======
# library(readr) # If the readr package is not loaded, uncomment and run this line of code
penguinsCSV <- read_csv(here::here("Intro_to_R", "Data/penguins_data.csv"))
>>>>>>> 073b09069aef56b2e99fa5914301950686f7b36a
```

The text below the code gives you information about what happened when you ran the code. Sometimes you'll get an error or warning message, but in this case, the output is telling you what variable types it assigned to each column.
Expand Down Expand Up @@ -390,7 +416,7 @@ Here are a few functions to help you take a first look at your data quickly:

```{r, results='hide'}
# install.package("DataExplorer")
# library(DataExplorer) # Uncomment if package isn't loaded in R
library(DataExplorer) # Uncomment if package isn't loaded in R
DataExplorer::create_report(penguins)
```

Expand Down
1,673 changes: 1,673 additions & 0 deletions Intro_to_R/Intro_to_R-Week1_Data.html

Large diffs are not rendered by default.

175 changes: 52 additions & 123 deletions Intro_to_R/Intro_to_R-Week2_Plots_and_Stats.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -11,10 +11,8 @@ editor_options:
## Goals:

1\. Describe
<!--# This is probably review of data exploration from week 1 Add exploratory plots here, intro to ggplot here rather than later -->

2\. Wrangle
<!--# This is probably review of tidyverse from week 1, need to add penguins.csv data to GitHub so this code runs -->

3\. Visualize variation

Expand Down Expand Up @@ -105,8 +103,6 @@ table(penguins$flipper_length_mm)

## Step 2: Wrangle

<!--# Add tibble/tidy data here so we have something to work with for visualizing -->

Let's get rid of any individuals with an NA in the sex or body_mass_g
columns and save that to a new dataframe.

Expand Down Expand Up @@ -135,115 +131,14 @@ adelie_penguins<-penguins_complete %>%
Let's look at the distribution of body mass for all species using the
hist() function

<!--# Use histogram to talk about base R plots -->

`hist()` - This function computes a histogram of the given data values.
The argument must be a numeric vector (a column of data is a vector).

```{r}
hist(penguins_complete$body_mass_g)
```

This and many other plots can be created using base R functions, but
`ggplot()` provides a more consistent framework to generate and combine
many different plots using tidy data as an input.

<!--# Add histogram created with ggplot - pull code from later in this document -->

By looking at these plot, can you guess what the mean might be? We can
calculate it and check.

```{r}
mean(penguins_complete$body_mass_g)
```

However, we might want to investigate each species a little more
specifically. Based on what little I know about penguins, I am think
that one species is a quite a bit bigger (thus probably weighs more)
than the others.

<!--# Replace boxplot with ggplot -->

**boxplot** is a function which allows you to produce box-and-whisker
plots of the given (grouped) values

The argument is a formula which specifies which grouping variable you
want to divide a numeric vector by.

```{r}
boxplot(penguins_complete$body_mass_g ~ penguins_complete$species)
```

We can use group_by and summarise to calculate group means.

```{r}
penguins_complete %>%
group_by(species) %>%
summarize(mean_mass = mean(body_mass_g))
```

We could also guess that sexual dimorphism would cause variation is mass
between the sexes (i.e., males are typically larger).

Let's look at those differences for Adelie penguins only.

```{r}
boxplot(adelie_penguins$body_mass_g ~ adelie_penguins$sex)
```

We can use group_by and summarise to calculate group means here as well.

```{r}
adelie_penguins %>%
group_by(sex) %>%
summarize(mean_mass = mean(body_mass_g), sd=sd(body_mass_g))
```

## Step 4: Visualize covariation

Now we may want to evaluate covariation between numeric variables, for
example, we could examine the relationship between flipper length and
body mass. What kind of relationship would you expect?

<!--# Probably want to replace the following with ggplot -->

**plot** is the generic base R plotting function that can be used to
create scatterplots, line plots, and more

The first argument is the x coordinates of the points in the plot

The second argument is the y coordinates of points in the plot

type = is an argument that tells R which type of plot should be drawn;
common options are 'p' for points, 'l' for lines, or 'n' for no plotting
(blank plotting area)

```{r}
plot(penguins_complete$flipper_length_mm, penguins_complete$body_mass_g, type = "p")
```

Let's addd some color by species and formatting

```{r}
plot(penguins_complete$flipper_length_mm, penguins_complete$body_mass_g, type = "n",xlab = "Flipper length (mm)", ylab = "Body mass (g)")
points(penguins_complete$flipper_length_mm[penguins_complete$species == "Adelie"], penguins_complete$body_mass_g[penguins_complete$species == "Adelie"], col = "red")
points(penguins_complete$flipper_length_mm[penguins_complete$species == "Chinstrap"], penguins_complete$body_mass_g[penguins_complete$species == "Chinstrap"], col = "green")
points(penguins_complete$flipper_length_mm[penguins_complete$species == "Gentoo"], penguins_complete$body_mass_g[penguins_complete$species == "Gentoo"], col = "blue")
legend(220, 3800, legend=c("Adelie", "Chinstrap", "Gentoo"),
col=c("red", "green", "blue"), pch = 1, bty = "n")
```

### ggplot

Now, I want to show how you'd make the same plots using the ggplot
syntax. For this, you'll need the ggplot2 package which is loaded as
For this, you'll need the ggplot2 package which is loaded as
part of the tidyverse.

There are 4 aspects of ggplots you need to know about:
Expand Down Expand Up @@ -294,23 +189,71 @@ ggplot(data=penguins_complete, aes(x=body_mass_g)) + geom_histogram(bins=8)
# delete or change the bins argument to see what happens
```

By looking at these plot, can you guess what the mean might be? We can
calculate it and check.

```{r}
mean(penguins_complete$body_mass_g)
```

However, we might want to investigate each species a little more
specifically. Based on what little I know about penguins, I am think
that one species is a quite a bit bigger (thus probably weighs more)
than the others.

```{r}
ggplot(data=penguins_complete, aes(x=species, y=body_mass_g)) + geom_boxplot() + stat_boxplot(geom ='errorbar', width = 0.5)
```

We can use group_by and summarise to calculate group means.

```{r}
penguins_complete %>%
group_by(species) %>%
summarize(mean_mass = mean(body_mass_g))
```

We could also guess that sexual dimorphism would cause variation is mass
between the sexes (i.e., males are typically larger).

Let's look at those differences for Adelie penguins only.

If you think about the way a boxplot is formatted, the categories are
typically on the x axis and the numeric variables range on the y.

```{r}
boxplot(adelie_penguins$body_mass_g ~ adelie_penguins$sex)
ggplot(data=adelie_penguins, aes(x=sex, y=body_mass_g)) + geom_boxplot() + stat_boxplot(geom ='errorbar', width = 0.5)
```

We can use group_by and summarise to calculate group means here as well.

```{r}
adelie_penguins %>%
group_by(sex) %>%
summarize(mean_mass = mean(body_mass_g), sd=sd(body_mass_g))
```

## Step 4: Visualize covariation

Now we may want to evaluate covariation between numeric variables, for
example, we could examine the relationship between flipper length and
body mass. What kind of relationship would you expect?

ggplot excels at multi-variable plots as compared to base graphics.

Here's the basic scatterplot:

```{r}
plot(penguins_complete$flipper_length_mm, penguins_complete$body_mass_g, type = "p")
ggplot(data=penguins_complete, aes(x=flipper_length_mm, y=body_mass_g)) + geom_point()
```

And the more advanced versions
Expand All @@ -319,23 +262,9 @@ Note: ggplot will automatically assign colors, but you can manually
change them as well (will discuss later)

```{r}
plot(penguins_complete$flipper_length_mm, penguins_complete$body_mass_g, type = "n",xlab = "Flipper length (mm)", ylab = "Body mass (g)")
points(penguins_complete$flipper_length_mm[penguins_complete$species == "Adelie"], penguins_complete$body_mass_g[penguins_complete$species == "Adelie"], col = "red")
points(penguins_complete$flipper_length_mm[penguins_complete$species == "Chinstrap"], penguins_complete$body_mass_g[penguins_complete$species == "Chinstrap"], col = "green")
points(penguins_complete$flipper_length_mm[penguins_complete$species == "Gentoo"], penguins_complete$body_mass_g[penguins_complete$species == "Gentoo"], col = "blue")
legend(220, 3800, legend=c("Adelie", "Chinstrap", "Gentoo"),
col=c("red", "green", "blue"), pch = 1, bty = "n")
# ggplot version:
ggplot(data = penguins_complete, aes(x=flipper_length_mm, y=body_mass_g,color=species)) + geom_point()
```

`geom_line()` will create a line graph (connect each datapoint with a
Expand Down
1,666 changes: 1,666 additions & 0 deletions Intro_to_R/report.html

Large diffs are not rendered by default.

0 comments on commit afeee18

Please sign in to comment.