forked from jmledford3115/datascibiol
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathlab4_hw.Rmd
78 lines (58 loc) · 3.01 KB
/
lab4_hw.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
---
title: "Lab 4 Homework"
author: "Joel Ledford"
date: "Winter 2019"
output:
html_document:
theme: spacelab
toc: no
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
## Instructions
Answer the following questions and complete the exercises in RMarkdown. Please embed all of your code, keep track of your versions using git, and push your final work to our [GitHub repository](https://github.com/FRS417-DataScienceBiologists). I will randomly select a few examples of student work at the start of each session to use as examples so be sure that your code is working to the best of your ability.
## Load the tidyverse
```{r message=FALSE, warning=FALSE}
library(tidyverse)
```
## Mammals Life History
Aren't mammals fun? Let's load up some more mammal data. This will be life history data for mammals. The [data](http://esapubs.org/archive/ecol/E084/093/) are from: *S. K. Morgan Ernest. 2003. Life history characteristics of placental non-volant mammals. Ecology 84:3402.*
```{r}
life_history <- readr::read_csv("data/mammal_lifehistories_v2.csv")
```
Rename some of the variables. Notice that I am replacing the old `life_history` data.
```{r}
life_history <-
life_history %>%
dplyr::rename(
genus = Genus,
wean_mass = `wean mass`,
max_life = `max. life`,
litter_size = `litter size`,
litters_yr = `litters/year`
)
```
1. Explore the data using the method that you prefer. Below, I am using a new package called `skimr`. If you want to use this, make sure that it is installed.
```{r}
#install.packages("skimr")
```
```{r}
library("skimr")
```
```{r}
life_history %>%
skimr::skim()
```
2. Run the code below. Are there any NA's in the data? Does this seem likely?
```{r}
life_history %>%
summarize(number_nas = sum(is.na(life_history)))
```
3. Are there any missing data (NA's) represented by different values? How much and where? In which variables do we have the most missing data? Can you think of a reason why so much data are missing in this variable?
4. Compared to the `msleep` data, we have better representation among taxa. Produce a summary that shows the number of observations by taxonomic order.
5. Mammals have a range of life histories, including lifespan. Produce a summary of lifespan in years by order. Be sure to include the minimum, maximum, mean, standard deviation, and total n.
6. Let's look closely at gestation and newborns. Summarize the mean gestation, newborn mass, and weaning mass by order. Add a new column that shows mean gestation in years and sort in descending order. Which group has the longest mean gestation? What is the common name for these mammals?
7. To save us the hassle in the future, how could we tell R how NA values are represented in our data file when we read it in?
## Push your final code to [GitHub](https://github.com/FRS417-DataScienceBiologists)
Make sure that you push your code into the appropriate folder. Also, be sure that you have check the `keep md` file in the knit preferences.