-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathswirlnotes.txt
287 lines (169 loc) · 11.3 KB
/
swirlnotes.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
First, use flag_colors <- flags[, 11:17] to extract the columns containing the color data and store them in a new data frame called
| flag_colors. (Note the comma before 11:17. This subsetting command tells R that we want all rows, but only columns 11 through 17.)
|============================= | 23%
| Use dim(my_vector) to confirm that we've set the `dim` attribute correctly.
> dim(my_vector)
[1] 4 5
| Keep working like that and you'll get there!
|================================= | 26%
| Another way to see this is by calling the attributes() function on my_vector. Try it now.
> attributes(my_vector)
$dim
[1] 4 5
| Great job!
|===================================== | 29%
| Just like in math class, when dealing with a 2-dimensional object (think rectangular table), the first number is the number of rows and
| the second is the number of columns. Therefore, we just gave my_vector 4 rows and 5 columns.
---------COLUMNS--------------------
> my_vector
[,1] [,2] [,3] [,4] [,5]
[1,] 1 5 9 13 17
[2,] 2 6 10 14 18
[3,] 3 7 11 15 19
[4,] 4 8 12 16 20
| All that practice is paying off!
|============================================ | 34%
| Now, let's confirm it's actually a matrix by using the class() function. Type class(my_vector) to see what I mean.
> class(my_vector)
[1] "matrix"
| You are quite good my friend!
|================================================ | 37%
| Sure enough, my_vector is now a matrix. We should store it in a new variable that helps us remember what it is. Store the value of
| my_vector in a new variable called my_matrix.
> my_matrix <- my_vector
| Perseverance, that's the answer.
|==================================================== | 40%
| The example that we've used so far was meant to illustrate the point that a matrix is simply an atomic vector with a dimension
| attribute. A more direct method of creating the same matrix uses the matrix() function.
...
|======================================================= | 43%
| Bring up the help file for the matrix() function now using the `?` function.
> ?matrix()
| Almost! Try again. Or, type info() for more options.
| The command ?matrix will do the trick.
> ?matrix
| You are quite good my friend!
|=========================================================== | 46%
| Now, look at the documentation for the matrix function and see if you can figure out how to create a matrix containing the same numbers
| (1-20) and dimensions (4 rows, 5 columns) by calling the matrix() function. Store the result in a variable called my_matrix2.
> my_matrix2 <- matrix(1:20,4,5)
| That's a job well done!
|=============================================================== | 49%
| Finally, let's confirm that my_matrix and my_matrix2 are actually identical. The identical() function will tell us if its first two
| arguments are the same. Try it out.
> identical(my_matrix,my_matrix2)
[1] TRUE
| You're the best!
|================================================================== | 51%
| Now, imagine that the numbers in our table represent some measurements from a clinical experiment, where each row represents one patient
| and each column represents one variable for which measurements were taken.
...
|====================================================================== | 54%
| We may want to label the rows, so that we know which numbers belong to each patient in the experiment. One way to do this is to add a
| column to the matrix, which contains the names of all four people.
...
|========================================================================== | 57%
| Let's start by creating a character vector containing the names of our patients -- Bill, Gina, Kelly, and Sean. Remember that double
| quotes tell R that something is a character string. Store the result in a variable called patients.
> patients <- c("Bill","Gina","Kelly","Sean")
| All that practice is paying off!
|============================================================================= | 60%
| Now we'll use the cbind() function to 'combine columns'. Don't worry about storing the result in a new variable. Just call cbind() with
| two arguments -- the patients vector and my_matrix.
> cbind(patients,my_matrix)
patients
[1,] "Bill" "1" "5" "9" "13" "17"
[2,] "Gina" "2" "6" "10" "14" "18"
[3,] "Kelly" "3" "7" "11" "15" "19"
[4,] "Sean" "4" "8" "12" "16" "20"
| You nailed it! Good job!
|================================================================================= | 63%
| Something is fishy about our result! It appears that combining the character vector with our matrix of numbers caused everything to be
| enclosed in double quotes. This means we're left with a matrix of character strings, which is no good.
...
|===================================================================================== | 66%
| If you remember back to the beginning of this lesson, I told you that matrices can only contain ONE class of data. Therefore, when we
| tried to combine a character vector with a numeric matrix, R was forced to 'coerce' the numbers to characters, hence the double quotes.
...
|======================================================================================== | 69%
| This is called 'implicit coercion', because we didn't ask for it. It just happened. But why didn't R just convert the names of our
| patients to numbers? I'll let you ponder that question on your own.
...
|============================================================================================ | 71%
| So, we're still left with the question of how to include the names of our patients in the table without destroying the integrity of our
| numeric data. Try the following -- my_data <- data.frame(patients, my_matrix)
> data.frame(patients,my_matrix)
patients X1 X2 X3 X4 X5
1 Bill 1 5 9 13 17
2 Gina 2 6 10 14 18
3 Kelly 3 7 11 15 19
4 Sean 4 8 12 16 20
| Almost! Try again. Or, type info() for more options.
| Type my_data <- data.frame(patients, my_matrix), so we can explore what happens.
> my_data <- data.frame(patients,my_matrix)
| Excellent job!
|================================================================================================ | 74%
| Now view the contents of my_data to see what we've come up with.
> my_data
patients X1 X2 X3 X4 X5
1 Bill 1 5 9 13 17
2 Gina 2 6 10 14 18
3 Kelly 3 7 11 15 19
4 Sean 4 8 12 16 20
| You are amazing!
|==================================================================================================== | 77%
| It looks like the data.frame() function allowed us to store our character vector of names right alongside our matrix of numbers. That's
| exactly what we were hoping for!
...
|======================================================================================================= | 80%
| Behind the scenes, the data.frame() function takes any number of arguments and returns a single object of class `data.frame` that is
| composed of the original objects.
...
|=========================================================================================================== | 83%
| Let's confirm this by calling the class() function on our newly created data frame.
> class(my_data)
[1] "data.frame"
| Keep working like that and you'll get there!
|=============================================================================================================== | 86%
| It's also possible to assign names to the individual rows and columns of a data frame, which presents another possible way of
| determining which row of values in our table belongs to each patient.
...
|================================================================================================================== | 89%
| However, since we've already solved that problem, let's solve a different problem by assigning names to the columns of our data frame so
| that we know what type of measurement each column represents.
...
|====================================================================================================================== | 91%
| Since we have six columns (including patient names), we'll need to first create a vector containing one element for each column. Create
| a character vector called cnames that contains the following values (in order) -- "patient", "age", "weight", "bp", "rating", "test".
> cnames <- c("patient","age","weight","bp","rating","test")
| You are really on a roll!
|========================================================================================================================== | 94%
| Now, use the colnames() function to set the `colnames` attribute for our data frame. This is similar to the way we used the dim()
| function earlier in this lesson.
> colnames(cnames)
NULL
| That's not the answer I was looking for, but try again. Or, type info() for more options.
| Try colnames(my_data) <- cnames.
> colnames(my_data) <- cnames
| You got it!
|============================================================================================================================= | 97%
| Let's see if that got the job done. Print the contents of my_data.
> print(my_data)
patient age weight bp rating test
1 Bill 1 5 9 13 17
2 Gina 2 6 10 14 18
3 Kelly 3 7 11 15 19
4 Sean 4 8 12 16 20
| You almost had it, but not quite. Try again. Or, type info() for more options.
| Print the contents of my_data to the console.
> my_data
patient age weight bp rating test
1 Bill 1 5 9 13 17
2 Gina 2 6 10 14 18
3 Kelly 3 7 11 15 19
4 Sean 4 8 12 16 20
| Perseverance, that's the answer.
|=================================================================================================================================| 100%
| In this lesson, you learned the basics of working with two very important and common data structures -- matrices and data frames.
| There's much more to learn and we'll be covering more advanced topics, particularly with respect to data frames, in future lessons.
...