-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathproject1.Rmd
277 lines (201 loc) · 12.6 KB
/
project1.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
---
title: "Chapter 1: Data visualization using ggplot2"
description: |
Learn how to plot different types of graphs using the ggplot2 package.
author:
- name: Jewel Johnson
date: 12-02-2021
output:
distill::distill_article:
toc_depth: 3
preview: photos/ggplot2.png
---
```{r setup, include=FALSE}
library(ggplot2)
library(palmerpenguins)
library(dplyr)
library(rmarkdown)
knitr::opts_chunk$set(echo = TRUE)
```
```{r, xaringanExtra-clipboard, echo=FALSE}
htmltools::tagList(
xaringanExtra::use_clipboard(
button_text = "<i class=\"fa fa-clone fa-2x\" style=\"color: #301e64\"></i>",
success_text = "<i class=\"fa fa-check fa-2x\" style=\"color: #90BE6D\"></i>",
error_text = "<i class=\"fa fa-times fa-2x\" style=\"color: #F94144\"></i>"
),
rmarkdown::html_dependency_font_awesome()
)
```
---
#start editing from here
---
## Introduction to ggplot2 package
In this chapter we will be plotting different types of graphs using a package called `ggplot2` in R. The **gg**plot2 package is based on '**grammar of graphics** plot' which provides a systematic way of doing data visualizations in R. With a few lines of code you can plot a simple graph and by adding more layers onto it you can create complex yet elegant data visualizations.
A ggplot2 graph is made up of three components.
1. **Data**: Data of your choice that you want to visually summarise.
2. **Geometry** or **geoms**: Geometry dictates the type of graph that you want to plot and this information is conveyed to ggplot2 through the `geom()` command code. For e.g. using the `geom_boxplot()` command, you can plot a box plot with your data. Likewise, there are many types of geometry that you can plot using the ggplot2 package.
3. **Aesthetic** mappings: Aesthetics define the different kinds of information that you want to include in the plot. One fo the most important aesthetic is in choosing which data values to plot on the x-axis and the y-axis. Another example is changing the colour of the data points, which can be used to differentiate two different categories in the data. The use of aesthetics depends on the geometry that you are using. We use the command `aes()` for adding different types of aesthetics to the plot. We will learn more about `aes()` in Chapter 2.
This tutorial is primarily focused on students who are beginners in R programming and wants to quickly plot their data without much of a hassle. So without further ado let's plot some graphs!
## Setting up the prerequisites
First, we need to install the `ggplot2` package in R as it does not come in the standard distribution of R.
* To install packages in R we use the command `install.packages()` and to load packages we use the command `library()`. Therefore to install and load `ggplot2` package we use the following lines of command.
```{r eval=FALSE}
install.packages("ggplot2")
library(ggplot2)
```
All right we have the `ggplot2` package loaded, now we just need some data to plot. Most R programming tutorials use the `iris` dataset as an example. But this tutorial won't be like most tutorials. So let me introduce you to some lovely penguins from Palmer Station in Antarctica!
For this tutorial, we will be installing the `palmerpenguins` package which showcases body measurements taken from three different species of penguins from Antarctica. This package was made possible by the efforts of [Dr. Allison Horst](https://www.allisonhorst.com/). The penguin data was collected and made available by [Dr. Kristen Gorman](https://www.uaf.edu/cfos/people/faculty/detail/kristen-gorman.php) and the [Palmer Station, Antarctica LTER](https://pal.lternet.edu/).
* Install the `palmerpenguins` package and load it in R.
```{r eval=FALSE}
install.packages("palmerpenguins")
library(palmerpenguins)
```
Now there are two datasets in this package. We will be using the `penguins` dataset which is a simplified version of the raw data present in the package.
* Use the command `head()` to display the first few values of `penguins` dataset to see how it looks like
```{r , eval=FALSE}
library(palmerpenguins)
head(penguins)
```
```{r , echo=FALSE, layout="l-body-outset"}
paged_table(head(penguins))
```
We can see that are 8 columns in the dataset representing different values. Now let us try plotting some graphs with this data.
### 1. Bar graph
So we will try to plot a simple bar graph first. Bar graphs are used to represent categorical data where the height of the rectangular bar represents the value for that category. We will plot a bargraph representing frequency data for all three species of penguins.
* We will be using the `geom_bar()` command to plot the bar graph. Let us also use the command `theme_bw()` for a nice looking theme.
```{r, layout="l-body-outset"}
ggplot(data = penguins, aes(x = species, fill = species)) +
xlab("Species") + ylab("Frequency") +
ggtitle("Frequency of individuals for each species") +
geom_bar() + theme_bw()
```
### 2. Histogram
Histograms are similar to bar graphs visually. But histograms are used to represent continuous data. Also the all the rectangular bars will have the same bin size or width.
* We can plot a histogram using the command `geom_histogram()`.
```{r warning = TRUE, layout="l-body-outset"}
ggplot(data = penguins, aes(x = body_mass_g, fill = species)) +
xlab("Body Mass (g)") + ylab("Frequency") +
ggtitle("Frequency of individuals for respective body mass") +
geom_histogram(bins = 25) + theme_bw()
```
The warning message indicates that for two rows in the dataset, they have NA values or that they did not have any values present. This is true for real-life cases, as during data collection sometimes you will be unable to collect data due to various reasons. So this is perfectly fine.
### 3. Line graph
Line graph simply joins together data points to show overall distribution.
* Use the command `geom_line()` for plotting a line graph.
```{r warning=FALSE, message=FALSE, layout="l-body-outset"}
ggplot(data = penguins, aes(x = bill_length_mm,
y = bill_depth_mm, colour = species)) +
xlab("Bill length (mm)") + ylab("Bill depth (mm)") +
ggtitle("Bill length vs Bill depth") + geom_line() + theme_bw()
```
### 4. Scatter plot
The scatter plot simply denotes the data points in the dataset.
* Use the command `geom_point()` to plot a scatter plot.
```{r warning=FALSE, message=FALSE, layout="l-body-outset"}
ggplot(data = penguins, aes(x = body_mass_g, y = flipper_length_mm,
shape = species, colour = species)) +
xlab("Body mass (g)") + ylab("Flipper length (mm)") +
ggtitle("Body mass vs Filpper length") + geom_point(size = 2) + theme_bw()
```
### 5. Density Plot
Density plots are similar to histograms but show it shows the overall distribution of the data in a finer way. This way we will get a bell-shaped curve if our data follows a normal distribution.
* Use the command `geom_density()` to a density plot.
```{r warning = FALSE, layout="l-body-outset"}
ggplot(data = penguins, aes(x = body_mass_g, fill = species)) +
xlab("Body Mass (g)") + ylab("Density") + ggtitle("Body mass distribution") +
geom_density() + theme_bw()
```
Since we plotted for all three species the graph looks clustered. Let us try plotting the same graph for only *gentoo* penguins. We will use the `dplyr` package to `filter()` data for *gentoo* penguins alone. The `dplyr` package comes in-built with R so just load the `dplyr` package using the command `library()`.
```{r warning = FALSE, message=FALSE, layout="l-body-outset"}
library(dplyr)
penguins_gentoo <- penguins %>% filter(species == "Gentoo")
ggplot(data = penguins_gentoo, aes(x = body_mass_g)) +
xlab("Body Mass of Gentoo penguins (g)") + ylab("Density") +
ggtitle("Body mass distribution of Gentoo penguins") +
geom_density(fill = "red") + theme_bw()
```
### 6. Dot-plot
Dot-plot is similar to a density plot but it shows discretely each data point in the distribution.
* Use the command `geom_dotplot()` to plot a dot-plot.
```{r warning=FALSE, message=FALSE, layout="l-body-outset"}
ggplot(data = penguins, aes(x = species, y = body_mass_g, fill = species)) +
xlab("Species") + ylab("Body mass (g)") +
ggtitle("Body mass in three diferent species of penguins") +
geom_dotplot(binaxis = "y", stackdir = "center", binwidth = 100) + theme_bw()
```
### 7. Rug-plot
Rug-plot is a simple way to visualize the distribution of data along the axis lines. It is often used in conjunction with other graphical representations.
* Use the command `geom_rug()` to plot a rug-plot.
```{r warning = FALSE, message=FALSE, layout="l-body-outset"}
penguins_gentoo <- penguins %>% filter(species == "Gentoo")
ggplot(data = penguins_gentoo, aes(x = body_mass_g, y = flipper_length_mm)) +
xlab("Body Mass of Gentoo penguins (g)") + ylab("Density") +
ggtitle("Body mass distribution of Gentoo penguins") +
geom_point(colour = "darkred") + geom_rug() + theme_bw()
```
### 8. Box plot
Box-plot is one of the better ways of showing data via quartiles. You can learn more about box plots here.
* Use the command `geom_boxplot()` to plot a box-plot.
```{r warning=FALSE, message=FALSE, layout="l-body-outset"}
ggplot(data = penguins, aes(x = species, y = body_mass_g, colour = species)) +
xlab("Species") + ylab("Body mass (g)") +
ggtitle("Body mass in three diferent species of penguins") + geom_boxplot() +
theme_bw()
```
### 9. Violin plot
Violin plot can be considered as the best of both a box-plot and a density plot. It shows the quartile values, like in a box-plot and also shows the distribution of the data, like in a density plot.
* Use the command `geom_violin()` in conjunction with `geom_boxplot()` to plot a violin plot.
```{r warning=FALSE, message=FALSE, layout="l-body-outset"}
ggplot(data = penguins, aes(x = species, y = body_mass_g, fill = species)) +
xlab("Species") + ylab("Body mass (g)") +
ggtitle("Body mass in three diferent species of penguins") +
geom_violin(aes(colour = species), trim = TRUE) + geom_boxplot(width = 0.2) +
theme_bw()
```
## Saving your ggplot2 graphs
* Use the command `ggsave()` to save the graph locally. In the code below, 'my_graph' is the ggplot element containing your graph. The plot will be saved in your working directory.
```{r, eval=FALSE}
my_graph <- ggplot(data = penguins, aes(x = species, y = body_mass_g,
fill = species)) +
xlab("Species") + ylab("Body mass (g)") +
ggtitle("Body mass in three diferent species of penguins") +
geom_violin(aes(colour = species), trim = TRUE) +
geom_boxplot(width = 0.2) +
theme_bw()
#to save the plot
ggsave(my_graph, filename = "your_graph_name.png", width = 20, height = 20,
units = "cm")
```
## Summary
I hope this tutorial helped you to get familiarised with the `ggplot2` commands. There are many more different types of graphs that you can plot using `ggplot2`. The tutorial only showed some of the commonly used ones. The best way to learn R is through actually doing it yourself. Try to recreate the examples given in this tutorial by yourself and then try what you learned with the different datasets available in R. Have a good day!
---
#bottom buttons to guide to next/previous chapters.
---
<br>
<a href="project2.html" class="btn button_round" style="float: right;">Next chapter:<br> 2. Customizing graphs in ggplot2</a>
<!-- adding share buttons on the right side of the page -->
<!-- AddToAny BEGIN -->
<div class="a2a_kit a2a_kit_size_32 a2a_floating_style a2a_vertical_style" style="right:0px; top:150px; data-a2a-url="https://jeweljohnsonj.github.io/jeweljohnson.github.io/" data-a2a-title="One-carat Blog">
<a class="a2a_button_twitter"></a>
<a class="a2a_button_whatsapp"></a>
<a class="a2a_button_telegram"></a>
<a class="a2a_button_google_gmail"></a>
<a class="a2a_button_pinterest"></a>
<a class="a2a_button_reddit"></a>
<a class="a2a_button_facebook"></a>
<a class="a2a_button_facebook_messenger"></a>
</div>
<script>
var a2a_config = a2a_config || {};
a2a_config.onclick = 1;
</script>
<script async src="https://static.addtoany.com/menu/page.js"></script>
<!-- AddToAny END -->
## References
1. H. Wickham. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2016. Read more about ggplot2 [here](https://ggplot2.tidyverse.org/). You can also look at the [cheat sheet](https://github.com/rstudio/cheatsheets/blob/master/data-visualization-2.1.pdf) for all the syntax used in `ggplot2`. Also check this [out](https://github.com/erikgahner/awesome-ggplot2).
2. Horst AM, Hill AP, Gorman KB (2020). palmerpenguins: Palmer Archipelago (Antarctica) penguin data. R package version 0.1.0. https://allisonhorst.github.io/palmerpenguins/. doi: 10.5281/zenodo.3960218.
## Last updated on {.appendix}
```{r, echo=FALSE}
Sys.time()
```