core-methods-in-edm · dl3418 · Oct 6, 2020
diff --git a/Assignment 2-2020.Rmd b/Assignment 2-2020.Rmd
@@ -1,6 +1,6 @@
 ---
 title: "Assignment 2"
-author: "Charles Lang"
+author: "Dan Lei"
 date: "September 24, 2020"
 output: html_document
 ---
@@ -96,12 +96,25 @@ pairs(D5)
 #round() rounds numbers to whole number values
 #sample() draws a random samples from the groups vector according to a uniform distribution
 
-
+score <- rnorm(100, 75, 15)
+hist(score,breaks = 30)
+S1 <- data.frame(score)
+library(dplyr)
+S1 <- filter(S1, score <= 100)
+hist(S1$score)
+S2 <- data.frame(rep(100, 100-nrow(S1)))
+names(S2) <- "score"
+S3 <- bind_rows(S1,S2)
+S3$score <- round(S3$score,0)
+interest <- c("sport", "music", "nature", "literature")
+S3$interest <- sample(interest, 100, replace = TRUE)
+S3$stid <- seq(1,100,1)
 ```
 
 2. Using base R commands, draw a histogram of the scores. Change the breaks in your histogram until you think they best represent your data.
 
 ```{r}
+hist(S3$score, breaks = 10)
 
 ```
 
@@ -111,55 +124,70 @@ pairs(D5)
 ```{r}
 #cut() divides the range of scores into intervals and codes the values in scores according to which interval they fall. We use a vector called `letters` as the labels, `letters` is a vector made up of the letters of the alphabet.
 
+label <- letters[1:10]
+S3$breaks <- cut(S3$score, breaks = 10, labels = label)
+
 ```
 
 4. Now using the colorbrewer package (RColorBrewer; http://colorbrewer2.org/#type=sequential&scheme=BuGn&n=3) design a pallette and assign it to the groups in your data on the histogram.
 
 ```{r}
 library(RColorBrewer)
 #Let's look at the available palettes in RColorBrewer
-
+display.brewer.all()
 #The top section of palettes are sequential, the middle section are qualitative, and the lower section are diverging.
 #Make RColorBrewer palette available to R and assign to your bins
-
+S3$colors <- brewer.pal(10, "Set3")
 #Use named palette in histogram
-
+hist(S3$score, col = S3$colors)
 ```
 
 
 5. Create a boxplot that visualizes the scores for each interest group and color each interest group a different color.
 
 ```{r}
 #Make a vector of the colors from RColorBrewer
+interest.col <- brewer.pal(4, "Dark2")
 
+boxplot(score ~ interest, S3, col = interest.col)
 ```
 
 
 6. Now simulate a new variable that describes the number of logins that students made to the educational game. They should vary from 1-25.
 
 ```{r}
-
+S3$login <- sample(1:25, 100, replace = TRUE)
 ```
 
 7. Plot the relationships between logins and scores. Give the plot a title and color the dots according to interest group.
 
 ```{r}
+plot(S3$login, S3$score, col= S3$colors, main = "Student Logins vs. Scores")
 
+S3$col1 <- ifelse(S3$interest == "music", "red", "green")
 
 ```
 
 
 8. R contains several inbuilt data sets, one of these in called AirPassengers. Plot a line graph of the the airline passengers over time using this data set.
 
 ```{r}
-
+AP <- data.frame(AirPassengers)
+plot(AirPassengers)
 ```
 
 
 9. Using another inbuilt data set, iris, plot the relationships between all of the variables in the data set. Which of these relationships is it appropraiet to run a correlation on? 
 
 ```{r}
-
+Iris <- data.frame(iris)
+plot(iris)
+plot(Iris$Sepal.Length, Iris$Sepal.Width)
+plot(Iris$Sepal.Length, Iris$Petal.Length)
+plot(Iris$Sepal.Length, Iris$Petal.Width)
+plot(Iris$Sepal.Width, Iris$Petal.Length)
+plot(Iris$Sepal.Length, Iris$Petal.Width)
+plot(Iris$Petal.Length, Iris$Petal.Width)
 ```
 
 # Part III - Analyzing Swirl
@@ -173,6 +201,13 @@ In this repository you will find data describing Swirl activity from the class s
 1. Insert a new code block
 2. Create a data frame from the `swirl-data.csv` file called `DF1`
 
+```{r}
+DF1 <- read.csv("swirl-data.csv", header = TRUE)
+
+```
+
+
+
 The variables are:
 
 `course_name` - the name of the R course the student attempted  
@@ -185,14 +220,27 @@ The variables are:
 `hash` - anonymyzed student ID  
 
 3. Create a new data frame that only includes the variables `hash`, `lesson_name` and `attempt` called `DF2`
+```{r}
+
+DF2 <- data.frame(DF1$hash, DF1$lesson_name, DF1$attempt)
+
+```
 
 4. Use the `group_by` function to create a data frame that sums all the attempts for each `hash` by each `lesson_name` called `DF3`
+```{r}
+
+DF3 <- DF2 %>% group_by(DF1.hash, DF1.lesson_name) %>% summarise(attempt = sum(DF1.attempt))
+
+```
 
 5. On a scrap piece of paper draw what you think `DF3` would look like if all the lesson names were column names
 
 6. Convert `DF3` to this format  
 
 7. Create a new data frame from `DF1` called `DF4` that only includes the variables `hash`, `lesson_name` and `correct`
+```{r}
+
+```
 
 8. Convert the `correct` variable so that `TRUE` is coded as the **number** `1` and `FALSE` is coded as `0`