-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path11-thinking-with-data.html
1594 lines (1545 loc) · 143 KB
/
11-thinking-with-data.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<!DOCTYPE html>
<html lang="" xml:lang="">
<head>
<meta charset="utf-8" />
<meta http-equiv="X-UA-Compatible" content="IE=edge" />
<title>Chapter 11 Tell Your Story with Data | Statistical Inference via Data Science</title>
<meta name="description" content="An open-source and fully-reproducible electronic textbook for teaching statistical inference using tidyverse data science tools." />
<meta name="generator" content="bookdown 0.22 and GitBook 2.6.7" />
<meta property="og:title" content="Chapter 11 Tell Your Story with Data | Statistical Inference via Data Science" />
<meta property="og:type" content="book" />
<meta property="og:url" content="https://moderndive.com/" />
<meta property="og:image" content="https://moderndive.com//images/logos/book_cover.png" />
<meta property="og:description" content="An open-source and fully-reproducible electronic textbook for teaching statistical inference using tidyverse data science tools." />
<meta name="github-repo" content="moderndive/ModernDive_book" />
<meta name="twitter:card" content="summary" />
<meta name="twitter:title" content="Chapter 11 Tell Your Story with Data | Statistical Inference via Data Science" />
<meta name="twitter:site" content="@ModernDive" />
<meta name="twitter:description" content="An open-source and fully-reproducible electronic textbook for teaching statistical inference using tidyverse data science tools." />
<meta name="twitter:image" content="https://moderndive.com//images/logos/book_cover.png" />
<meta name="author" content="Chester Ismay and Albert Y. Kim Foreword by Kelly S. McConville Adapted by William R. Morgan" />
<meta name="date" content="2021-07-28" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<meta name="apple-mobile-web-app-capable" content="yes" />
<meta name="apple-mobile-web-app-status-bar-style" content="black" />
<link rel="apple-touch-icon-precomposed" sizes="152x152" href="images/logos/favicons/apple-touch-icon.png" />
<link rel="shortcut icon" href="images/logos/favicons/favicon.ico" type="image/x-icon" />
<link rel="prev" href="10-inference-for-regression.html"/>
<link rel="next" href="A-appendixA.html"/>
<script src="libs/header-attrs-2.9/header-attrs.js"></script>
<script src="libs/jquery-2.2.3/jquery.min.js"></script>
<link href="libs/gitbook-2.6.7/css/style.css" rel="stylesheet" />
<link href="libs/gitbook-2.6.7/css/plugin-table.css" rel="stylesheet" />
<link href="libs/gitbook-2.6.7/css/plugin-bookdown.css" rel="stylesheet" />
<link href="libs/gitbook-2.6.7/css/plugin-highlight.css" rel="stylesheet" />
<link href="libs/gitbook-2.6.7/css/plugin-search.css" rel="stylesheet" />
<link href="libs/gitbook-2.6.7/css/plugin-fontsettings.css" rel="stylesheet" />
<link href="libs/gitbook-2.6.7/css/plugin-clipboard.css" rel="stylesheet" />
<link href="libs/anchor-sections-1.0.1/anchor-sections.css" rel="stylesheet" />
<script src="libs/anchor-sections-1.0.1/anchor-sections.js"></script>
<script src="libs/kePrint-0.0.1/kePrint.js"></script>
<link href="libs/lightable-0.0.1/lightable.css" rel="stylesheet" />
<script src="libs/htmlwidgets-1.5.3/htmlwidgets.js"></script>
<link href="libs/dygraphs-1.1.1/dygraph.css" rel="stylesheet" />
<script src="libs/dygraphs-1.1.1/dygraph-combined.js"></script>
<script src="libs/dygraphs-1.1.1/shapes.js"></script>
<script src="libs/moment-2.8.4/moment.js"></script>
<script src="libs/moment-timezone-0.2.5/moment-timezone-with-data.js"></script>
<script src="libs/moment-fquarter-1.0.0/moment-fquarter.min.js"></script>
<script src="libs/dygraphs-binding-1.1.1.6/dygraphs.js"></script>
<script>
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','https://www.google-analytics.com/analytics.js','ga');
ga('create', 'UA-89938436-1', 'auto');
ga('send', 'pageview');
</script>
<style type="text/css">
pre > code.sourceCode { white-space: pre; position: relative; }
pre > code.sourceCode > span { display: inline-block; line-height: 1.25; }
pre > code.sourceCode > span:empty { height: 1.2em; }
.sourceCode { overflow: visible; }
code.sourceCode > span { color: inherit; text-decoration: inherit; }
pre.sourceCode { margin: 0; }
@media screen {
div.sourceCode { overflow: auto; }
}
@media print {
pre > code.sourceCode { white-space: pre-wrap; }
pre > code.sourceCode > span { text-indent: -5em; padding-left: 5em; }
}
pre.numberSource code
{ counter-reset: source-line 0; }
pre.numberSource code > span
{ position: relative; left: -4em; counter-increment: source-line; }
pre.numberSource code > span > a:first-child::before
{ content: counter(source-line);
position: relative; left: -1em; text-align: right; vertical-align: baseline;
border: none; display: inline-block;
-webkit-touch-callout: none; -webkit-user-select: none;
-khtml-user-select: none; -moz-user-select: none;
-ms-user-select: none; user-select: none;
padding: 0 4px; width: 4em;
color: #aaaaaa;
}
pre.numberSource { margin-left: 3em; border-left: 1px solid #aaaaaa; padding-left: 4px; }
div.sourceCode
{ }
@media screen {
pre > code.sourceCode > span > a:first-child::before { text-decoration: underline; }
}
code span.al { color: #ff0000; font-weight: bold; } /* Alert */
code span.an { color: #60a0b0; font-weight: bold; font-style: italic; } /* Annotation */
code span.at { color: #7d9029; } /* Attribute */
code span.bn { color: #40a070; } /* BaseN */
code span.bu { } /* BuiltIn */
code span.cf { color: #007020; font-weight: bold; } /* ControlFlow */
code span.ch { color: #4070a0; } /* Char */
code span.cn { color: #880000; } /* Constant */
code span.co { color: #60a0b0; font-style: italic; } /* Comment */
code span.cv { color: #60a0b0; font-weight: bold; font-style: italic; } /* CommentVar */
code span.do { color: #ba2121; font-style: italic; } /* Documentation */
code span.dt { color: #902000; } /* DataType */
code span.dv { color: #40a070; } /* DecVal */
code span.er { color: #ff0000; font-weight: bold; } /* Error */
code span.ex { } /* Extension */
code span.fl { color: #40a070; } /* Float */
code span.fu { color: #06287e; } /* Function */
code span.im { } /* Import */
code span.in { color: #60a0b0; font-weight: bold; font-style: italic; } /* Information */
code span.kw { color: #007020; font-weight: bold; } /* Keyword */
code span.op { color: #666666; } /* Operator */
code span.ot { color: #007020; } /* Other */
code span.pp { color: #bc7a00; } /* Preprocessor */
code span.sc { color: #4070a0; } /* SpecialChar */
code span.ss { color: #bb6688; } /* SpecialString */
code span.st { color: #4070a0; } /* String */
code span.va { color: #19177c; } /* Variable */
code span.vs { color: #4070a0; } /* VerbatimString */
code span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warning */
</style>
<style type="text/css">
/* Used with Pandoc 2.11+ new --citeproc when CSL is used */
div.csl-bib-body { }
div.csl-entry {
clear: both;
}
.hanging div.csl-entry {
margin-left:2em;
text-indent:-2em;
}
div.csl-left-margin {
min-width:2em;
float:left;
}
div.csl-right-inline {
margin-left:2em;
padding-left:1em;
}
div.csl-indent {
margin-left: 2em;
}
</style>
<link rel="stylesheet" href="style.css" type="text/css" />
</head>
<body>
<div class="book without-animation with-summary font-size-2 font-family-1" data-basepath=".">
<div class="book-summary">
<nav role="navigation">
<ul class="summary">
<li class="chapter" data-level="" data-path="index.html"><a href="index.html"><i class="fa fa-check"></i>Welcome to ModernDive</a></li>
<li class="chapter" data-level="" data-path="foreword.html"><a href="foreword.html"><i class="fa fa-check"></i>Foreword</a></li>
<li class="chapter" data-level="" data-path="preface.html"><a href="preface.html"><i class="fa fa-check"></i>Preface</a>
<ul>
<li class="chapter" data-level="" data-path="preface.html"><a href="preface.html#introduction-for-students"><i class="fa fa-check"></i>Introduction for students</a>
<ul>
<li class="chapter" data-level="" data-path="preface.html"><a href="preface.html#what-we-hope-you-will-learn-from-this-book"><i class="fa fa-check"></i>What we hope you will learn from this book</a></li>
<li class="chapter" data-level="" data-path="preface.html"><a href="preface.html#datascience-pipeline"><i class="fa fa-check"></i>Data/science pipeline</a></li>
<li class="chapter" data-level="" data-path="preface.html"><a href="preface.html#reproducible-research"><i class="fa fa-check"></i>Reproducible research</a></li>
<li class="chapter" data-level="" data-path="preface.html"><a href="preface.html#final-note-for-students"><i class="fa fa-check"></i>Final note for students</a></li>
</ul></li>
<li class="chapter" data-level="" data-path="preface.html"><a href="preface.html#introduction-for-instructors"><i class="fa fa-check"></i>Introduction for instructors</a>
<ul>
<li class="chapter" data-level="" data-path="preface.html"><a href="preface.html#resources"><i class="fa fa-check"></i>Resources</a></li>
<li class="chapter" data-level="" data-path="preface.html"><a href="preface.html#why-did-we-write-this-book"><i class="fa fa-check"></i>Why did we write this book?</a></li>
<li class="chapter" data-level="" data-path="preface.html"><a href="preface.html#who-is-this-book-for"><i class="fa fa-check"></i>Who is this book for?</a></li>
</ul></li>
<li class="chapter" data-level="" data-path="preface.html"><a href="preface.html#connect-and-contribute"><i class="fa fa-check"></i>Connect and contribute</a></li>
<li class="chapter" data-level="" data-path="preface.html"><a href="preface.html#acknowledgements"><i class="fa fa-check"></i>Acknowledgements</a></li>
<li class="chapter" data-level="" data-path="preface.html"><a href="preface.html#about-this-book"><i class="fa fa-check"></i>About this book</a></li>
</ul></li>
<li class="chapter" data-level="" data-path="about-the-authors.html"><a href="about-the-authors.html"><i class="fa fa-check"></i>About the authors</a></li>
<li class="chapter" data-level="1" data-path="1-getting-started.html"><a href="1-getting-started.html"><i class="fa fa-check"></i><b>1</b> Getting Started with Data in R</a>
<ul>
<li class="chapter" data-level="1.1" data-path="1-getting-started.html"><a href="1-getting-started.html#r-rstudio"><i class="fa fa-check"></i><b>1.1</b> What are R and RStudio?</a>
<ul>
<li class="chapter" data-level="1.1.1" data-path="1-getting-started.html"><a href="1-getting-started.html#installing"><i class="fa fa-check"></i><b>1.1.1</b> Installing R and RStudio</a></li>
<li class="chapter" data-level="1.1.2" data-path="1-getting-started.html"><a href="1-getting-started.html#using-r-via-rstudio"><i class="fa fa-check"></i><b>1.1.2</b> Using R via RStudio</a></li>
</ul></li>
<li class="chapter" data-level="1.2" data-path="1-getting-started.html"><a href="1-getting-started.html#code"><i class="fa fa-check"></i><b>1.2</b> How do I code in R?</a>
<ul>
<li class="chapter" data-level="1.2.1" data-path="1-getting-started.html"><a href="1-getting-started.html#programming-concepts"><i class="fa fa-check"></i><b>1.2.1</b> Basic programming concepts and terminology</a></li>
<li class="chapter" data-level="1.2.2" data-path="1-getting-started.html"><a href="1-getting-started.html#messages"><i class="fa fa-check"></i><b>1.2.2</b> Errors, warnings, and messages</a></li>
<li class="chapter" data-level="1.2.3" data-path="1-getting-started.html"><a href="1-getting-started.html#tips-code"><i class="fa fa-check"></i><b>1.2.3</b> Tips on learning to code</a></li>
</ul></li>
<li class="chapter" data-level="1.3" data-path="1-getting-started.html"><a href="1-getting-started.html#packages"><i class="fa fa-check"></i><b>1.3</b> What are R packages?</a>
<ul>
<li class="chapter" data-level="1.3.1" data-path="1-getting-started.html"><a href="1-getting-started.html#package-installation"><i class="fa fa-check"></i><b>1.3.1</b> Package installation</a></li>
<li class="chapter" data-level="1.3.2" data-path="1-getting-started.html"><a href="1-getting-started.html#package-loading"><i class="fa fa-check"></i><b>1.3.2</b> Package loading</a></li>
<li class="chapter" data-level="1.3.3" data-path="1-getting-started.html"><a href="1-getting-started.html#package-use"><i class="fa fa-check"></i><b>1.3.3</b> Package use</a></li>
</ul></li>
<li class="chapter" data-level="1.4" data-path="1-getting-started.html"><a href="1-getting-started.html#rfishbase"><i class="fa fa-check"></i><b>1.4</b> Explore your first datasets</a>
<ul>
<li class="chapter" data-level="1.4.1" data-path="1-getting-started.html"><a href="1-getting-started.html#rfishpackage"><i class="fa fa-check"></i><b>1.4.1</b> <code>rfishbase</code> package</a></li>
<li class="chapter" data-level="1.4.2" data-path="1-getting-started.html"><a href="1-getting-started.html#fishbasedataframe"><i class="fa fa-check"></i><b>1.4.2</b> <code>fishbase</code> data frame</a></li>
<li class="chapter" data-level="1.4.3" data-path="1-getting-started.html"><a href="1-getting-started.html#exploredataframes"><i class="fa fa-check"></i><b>1.4.3</b> Exploring data frames</a></li>
<li class="chapter" data-level="1.4.4" data-path="1-getting-started.html"><a href="1-getting-started.html#identification-vs-measurement-variables"><i class="fa fa-check"></i><b>1.4.4</b> Identification and measurement variables</a></li>
<li class="chapter" data-level="1.4.5" data-path="1-getting-started.html"><a href="1-getting-started.html#help-files"><i class="fa fa-check"></i><b>1.4.5</b> Help files</a></li>
</ul></li>
<li class="chapter" data-level="1.5" data-path="1-getting-started.html"><a href="1-getting-started.html#conclusion"><i class="fa fa-check"></i><b>1.5</b> Conclusion</a>
<ul>
<li class="chapter" data-level="1.5.1" data-path="1-getting-started.html"><a href="1-getting-started.html#additional-resources"><i class="fa fa-check"></i><b>1.5.1</b> Additional resources</a></li>
<li class="chapter" data-level="1.5.2" data-path="1-getting-started.html"><a href="1-getting-started.html#whats-to-come"><i class="fa fa-check"></i><b>1.5.2</b> What’s to come?</a></li>
</ul></li>
</ul></li>
<li class="part"><span><b>I Data Science with tidyverse</b></span></li>
<li class="chapter" data-level="2" data-path="2-viz.html"><a href="2-viz.html"><i class="fa fa-check"></i><b>2</b> Data Visualization</a>
<ul>
<li class="chapter" data-level="" data-path="2-viz.html"><a href="2-viz.html#needed-packages"><i class="fa fa-check"></i>Needed packages</a></li>
<li class="chapter" data-level="2.1" data-path="2-viz.html"><a href="2-viz.html#grammarofgraphics"><i class="fa fa-check"></i><b>2.1</b> The grammar of graphics</a>
<ul>
<li class="chapter" data-level="2.1.1" data-path="2-viz.html"><a href="2-viz.html#components-of-the-grammar"><i class="fa fa-check"></i><b>2.1.1</b> Components of the grammar</a></li>
<li class="chapter" data-level="2.1.2" data-path="2-viz.html"><a href="2-viz.html#gapminder"><i class="fa fa-check"></i><b>2.1.2</b> Gapminder data</a></li>
<li class="chapter" data-level="2.1.3" data-path="2-viz.html"><a href="2-viz.html#other-components"><i class="fa fa-check"></i><b>2.1.3</b> Other components</a></li>
<li class="chapter" data-level="2.1.4" data-path="2-viz.html"><a href="2-viz.html#ggplot2-package"><i class="fa fa-check"></i><b>2.1.4</b> ggplot2 package</a></li>
</ul></li>
<li class="chapter" data-level="2.2" data-path="2-viz.html"><a href="2-viz.html#FiveNG"><i class="fa fa-check"></i><b>2.2</b> Five named graphs - the 5NG</a></li>
<li class="chapter" data-level="2.3" data-path="2-viz.html"><a href="2-viz.html#scatterplots"><i class="fa fa-check"></i><b>2.3</b> 5NG#1: Scatterplots</a>
<ul>
<li class="chapter" data-level="2.3.1" data-path="2-viz.html"><a href="2-viz.html#geompoint"><i class="fa fa-check"></i><b>2.3.1</b> Scatterplots via <code>geom_point</code></a></li>
<li class="chapter" data-level="2.3.2" data-path="2-viz.html"><a href="2-viz.html#overplotting"><i class="fa fa-check"></i><b>2.3.2</b> Overplotting</a></li>
<li class="chapter" data-level="2.3.3" data-path="2-viz.html"><a href="2-viz.html#summary"><i class="fa fa-check"></i><b>2.3.3</b> Summary</a></li>
</ul></li>
<li class="chapter" data-level="2.4" data-path="2-viz.html"><a href="2-viz.html#linegraphs"><i class="fa fa-check"></i><b>2.4</b> 5NG#2: Linegraphs</a>
<ul>
<li class="chapter" data-level="2.4.1" data-path="2-viz.html"><a href="2-viz.html#geomline"><i class="fa fa-check"></i><b>2.4.1</b> Linegraphs via <code>geom_line</code></a></li>
<li class="chapter" data-level="2.4.2" data-path="2-viz.html"><a href="2-viz.html#summary-1"><i class="fa fa-check"></i><b>2.4.2</b> Summary</a></li>
</ul></li>
<li class="chapter" data-level="2.5" data-path="2-viz.html"><a href="2-viz.html#facets"><i class="fa fa-check"></i><b>2.5</b> Facets</a></li>
<li class="chapter" data-level="2.6" data-path="2-viz.html"><a href="2-viz.html#histograms"><i class="fa fa-check"></i><b>2.6</b> 5NG#3: Histograms</a>
<ul>
<li class="chapter" data-level="2.6.1" data-path="2-viz.html"><a href="2-viz.html#geomhistogram"><i class="fa fa-check"></i><b>2.6.1</b> Histograms via <code>geom_histogram</code></a></li>
<li class="chapter" data-level="2.6.2" data-path="2-viz.html"><a href="2-viz.html#adjustbins"><i class="fa fa-check"></i><b>2.6.2</b> Adjusting the bins</a></li>
<li class="chapter" data-level="2.6.3" data-path="2-viz.html"><a href="2-viz.html#summary-2"><i class="fa fa-check"></i><b>2.6.3</b> Summary</a></li>
</ul></li>
<li class="chapter" data-level="2.7" data-path="2-viz.html"><a href="2-viz.html#boxplots"><i class="fa fa-check"></i><b>2.7</b> 5NG#4: Boxplots</a>
<ul>
<li class="chapter" data-level="2.7.1" data-path="2-viz.html"><a href="2-viz.html#geomboxplot"><i class="fa fa-check"></i><b>2.7.1</b> Boxplots via <code>geom_boxplot</code></a></li>
<li class="chapter" data-level="2.7.2" data-path="2-viz.html"><a href="2-viz.html#summary-3"><i class="fa fa-check"></i><b>2.7.2</b> Summary</a></li>
</ul></li>
<li class="chapter" data-level="2.8" data-path="2-viz.html"><a href="2-viz.html#geombar"><i class="fa fa-check"></i><b>2.8</b> 5NG#5: Barplots</a>
<ul>
<li class="chapter" data-level="2.8.1" data-path="2-viz.html"><a href="2-viz.html#barplots-via-geom_bar-or-geom_col"><i class="fa fa-check"></i><b>2.8.1</b> Barplots via <code>geom_bar</code> or <code>geom_col</code></a></li>
<li class="chapter" data-level="2.8.2" data-path="2-viz.html"><a href="2-viz.html#must-avoid-pie-charts"><i class="fa fa-check"></i><b>2.8.2</b> Must avoid pie charts!</a></li>
<li class="chapter" data-level="2.8.3" data-path="2-viz.html"><a href="2-viz.html#two-categ-barplot"><i class="fa fa-check"></i><b>2.8.3</b> Two categorical variables</a></li>
<li class="chapter" data-level="2.8.4" data-path="2-viz.html"><a href="2-viz.html#summary-4"><i class="fa fa-check"></i><b>2.8.4</b> Summary</a></li>
</ul></li>
<li class="chapter" data-level="2.9" data-path="2-viz.html"><a href="2-viz.html#data-vis-conclusion"><i class="fa fa-check"></i><b>2.9</b> Conclusion</a>
<ul>
<li class="chapter" data-level="2.9.1" data-path="2-viz.html"><a href="2-viz.html#summary-table"><i class="fa fa-check"></i><b>2.9.1</b> Summary table</a></li>
<li class="chapter" data-level="2.9.2" data-path="2-viz.html"><a href="2-viz.html#function-argument-specification"><i class="fa fa-check"></i><b>2.9.2</b> Function argument specification</a></li>
<li class="chapter" data-level="2.9.3" data-path="2-viz.html"><a href="2-viz.html#additional-resources-1"><i class="fa fa-check"></i><b>2.9.3</b> Additional resources</a></li>
<li class="chapter" data-level="2.9.4" data-path="2-viz.html"><a href="2-viz.html#whats-to-come-3"><i class="fa fa-check"></i><b>2.9.4</b> What’s to come</a></li>
</ul></li>
</ul></li>
<li class="chapter" data-level="3" data-path="3-wrangling.html"><a href="3-wrangling.html"><i class="fa fa-check"></i><b>3</b> Data Wrangling</a>
<ul>
<li class="chapter" data-level="" data-path="3-wrangling.html"><a href="3-wrangling.html#wrangling-packages"><i class="fa fa-check"></i>Needed packages</a></li>
<li class="chapter" data-level="3.1" data-path="3-wrangling.html"><a href="3-wrangling.html#piping"><i class="fa fa-check"></i><b>3.1</b> The pipe operator: <code>%>%</code></a></li>
<li class="chapter" data-level="3.2" data-path="3-wrangling.html"><a href="3-wrangling.html#filter"><i class="fa fa-check"></i><b>3.2</b> <code>filter</code> rows</a></li>
<li class="chapter" data-level="3.3" data-path="3-wrangling.html"><a href="3-wrangling.html#slice-rows"><i class="fa fa-check"></i><b>3.3</b> <code>slice</code> rows</a></li>
<li class="chapter" data-level="3.4" data-path="3-wrangling.html"><a href="3-wrangling.html#select"><i class="fa fa-check"></i><b>3.4</b> <code>select</code> variables</a>
<ul>
<li class="chapter" data-level="3.4.1" data-path="3-wrangling.html"><a href="3-wrangling.html#rename"><i class="fa fa-check"></i><b>3.4.1</b> <code>rename</code> variables</a></li>
</ul></li>
<li class="chapter" data-level="3.5" data-path="3-wrangling.html"><a href="3-wrangling.html#summarize"><i class="fa fa-check"></i><b>3.5</b> <code>summarize</code> variables</a></li>
<li class="chapter" data-level="3.6" data-path="3-wrangling.html"><a href="3-wrangling.html#groupby"><i class="fa fa-check"></i><b>3.6</b> <code>group_by</code> rows</a>
<ul>
<li class="chapter" data-level="3.6.1" data-path="3-wrangling.html"><a href="3-wrangling.html#grouping-by-more-than-one-variable"><i class="fa fa-check"></i><b>3.6.1</b> Grouping by more than one variable</a></li>
</ul></li>
<li class="chapter" data-level="3.7" data-path="3-wrangling.html"><a href="3-wrangling.html#mutate"><i class="fa fa-check"></i><b>3.7</b> <code>mutate</code> existing variables</a></li>
<li class="chapter" data-level="3.8" data-path="3-wrangling.html"><a href="3-wrangling.html#arrange"><i class="fa fa-check"></i><b>3.8</b> <code>arrange</code> and sort rows</a></li>
<li class="chapter" data-level="3.9" data-path="3-wrangling.html"><a href="3-wrangling.html#joins"><i class="fa fa-check"></i><b>3.9</b> <code>join</code> data frames</a></li>
<li class="chapter" data-level="3.10" data-path="3-wrangling.html"><a href="3-wrangling.html#wrangling-conclusion"><i class="fa fa-check"></i><b>3.10</b> Conclusion</a>
<ul>
<li class="chapter" data-level="3.10.1" data-path="3-wrangling.html"><a href="3-wrangling.html#summary-table-1"><i class="fa fa-check"></i><b>3.10.1</b> Summary table</a></li>
<li class="chapter" data-level="3.10.2" data-path="3-wrangling.html"><a href="3-wrangling.html#additional-resources-2"><i class="fa fa-check"></i><b>3.10.2</b> Additional resources</a></li>
<li class="chapter" data-level="3.10.3" data-path="3-wrangling.html"><a href="3-wrangling.html#whats-to-come-1"><i class="fa fa-check"></i><b>3.10.3</b> What’s to come?</a></li>
</ul></li>
</ul></li>
<li class="chapter" data-level="4" data-path="4-tidy.html"><a href="4-tidy.html"><i class="fa fa-check"></i><b>4</b> Data Importing and “Tidy” Data</a>
<ul>
<li class="chapter" data-level="" data-path="4-tidy.html"><a href="4-tidy.html#tidy-packages"><i class="fa fa-check"></i>Needed packages</a></li>
<li class="chapter" data-level="4.1" data-path="4-tidy.html"><a href="4-tidy.html#csv"><i class="fa fa-check"></i><b>4.1</b> Importing data</a>
<ul>
<li class="chapter" data-level="4.1.1" data-path="4-tidy.html"><a href="4-tidy.html#using-the-console"><i class="fa fa-check"></i><b>4.1.1</b> Using the console</a></li>
<li class="chapter" data-level="4.1.2" data-path="4-tidy.html"><a href="4-tidy.html#using-rstudios-interface"><i class="fa fa-check"></i><b>4.1.2</b> Using RStudio’s interface</a></li>
</ul></li>
<li class="chapter" data-level="4.2" data-path="4-tidy.html"><a href="4-tidy.html#tidy-data-ex"><i class="fa fa-check"></i><b>4.2</b> “Tidy” data</a>
<ul>
<li class="chapter" data-level="4.2.1" data-path="4-tidy.html"><a href="4-tidy.html#tidy-definition"><i class="fa fa-check"></i><b>4.2.1</b> Definition of “tidy” data</a></li>
<li class="chapter" data-level="4.2.2" data-path="4-tidy.html"><a href="4-tidy.html#converting-to-tidy-data"><i class="fa fa-check"></i><b>4.2.2</b> Converting to “tidy” data</a></li>
</ul></li>
<li class="chapter" data-level="4.3" data-path="4-tidy.html"><a href="4-tidy.html#case-study-tidy"><i class="fa fa-check"></i><b>4.3</b> Case study: Weight loss data</a></li>
<li class="chapter" data-level="4.4" data-path="4-tidy.html"><a href="4-tidy.html#tidyverse-package"><i class="fa fa-check"></i><b>4.4</b> <code>tidyverse</code> package</a></li>
<li class="chapter" data-level="4.5" data-path="4-tidy.html"><a href="4-tidy.html#tidy-data-conclusion"><i class="fa fa-check"></i><b>4.5</b> Conclusion</a>
<ul>
<li class="chapter" data-level="4.5.1" data-path="4-tidy.html"><a href="4-tidy.html#additional-resources-3"><i class="fa fa-check"></i><b>4.5.1</b> Additional resources</a></li>
<li class="chapter" data-level="4.5.2" data-path="4-tidy.html"><a href="4-tidy.html#whats-to-come-2"><i class="fa fa-check"></i><b>4.5.2</b> What’s to come?</a></li>
</ul></li>
</ul></li>
<li class="part"><span><b>II Data Modeling with moderndive</b></span></li>
<li class="chapter" data-level="5" data-path="5-regression.html"><a href="5-regression.html"><i class="fa fa-check"></i><b>5</b> Basic Regression</a>
<ul>
<li class="chapter" data-level="" data-path="5-regression.html"><a href="5-regression.html#reg-packages"><i class="fa fa-check"></i>Needed packages</a></li>
<li class="chapter" data-level="5.1" data-path="5-regression.html"><a href="5-regression.html#model1"><i class="fa fa-check"></i><b>5.1</b> One numerical explanatory variable</a>
<ul>
<li class="chapter" data-level="5.1.1" data-path="5-regression.html"><a href="5-regression.html#model1EDA"><i class="fa fa-check"></i><b>5.1.1</b> Exploratory data analysis</a></li>
<li class="chapter" data-level="5.1.2" data-path="5-regression.html"><a href="5-regression.html#model1table"><i class="fa fa-check"></i><b>5.1.2</b> Simple linear regression</a></li>
<li class="chapter" data-level="5.1.3" data-path="5-regression.html"><a href="5-regression.html#model1points"><i class="fa fa-check"></i><b>5.1.3</b> Observed/fitted values and residuals</a></li>
</ul></li>
<li class="chapter" data-level="5.2" data-path="5-regression.html"><a href="5-regression.html#model2"><i class="fa fa-check"></i><b>5.2</b> One categorical explanatory variable</a>
<ul>
<li class="chapter" data-level="5.2.1" data-path="5-regression.html"><a href="5-regression.html#model2EDA"><i class="fa fa-check"></i><b>5.2.1</b> Exploratory data analysis</a></li>
<li class="chapter" data-level="5.2.2" data-path="5-regression.html"><a href="5-regression.html#model2table"><i class="fa fa-check"></i><b>5.2.2</b> Linear regression</a></li>
<li class="chapter" data-level="5.2.3" data-path="5-regression.html"><a href="5-regression.html#model2points"><i class="fa fa-check"></i><b>5.2.3</b> Observed/fitted values and residuals</a></li>
</ul></li>
<li class="chapter" data-level="5.3" data-path="5-regression.html"><a href="5-regression.html#reg-related-topics"><i class="fa fa-check"></i><b>5.3</b> Related topics</a>
<ul>
<li class="chapter" data-level="5.3.1" data-path="5-regression.html"><a href="5-regression.html#correlation-is-not-causation"><i class="fa fa-check"></i><b>5.3.1</b> Correlation is not necessarily causation</a></li>
<li class="chapter" data-level="5.3.2" data-path="5-regression.html"><a href="5-regression.html#leastsquares"><i class="fa fa-check"></i><b>5.3.2</b> Best-fitting line</a></li>
<li class="chapter" data-level="5.3.3" data-path="5-regression.html"><a href="5-regression.html#underthehood"><i class="fa fa-check"></i><b>5.3.3</b> <code>get_regression_x()</code> functions</a></li>
</ul></li>
<li class="chapter" data-level="5.4" data-path="5-regression.html"><a href="5-regression.html#reg-conclusion"><i class="fa fa-check"></i><b>5.4</b> Conclusion</a>
<ul>
<li class="chapter" data-level="5.4.1" data-path="5-regression.html"><a href="5-regression.html#additional-resources-basic-regression"><i class="fa fa-check"></i><b>5.4.1</b> Additional resources</a></li>
<li class="chapter" data-level="5.4.2" data-path="5-regression.html"><a href="5-regression.html#whats-to-come-4"><i class="fa fa-check"></i><b>5.4.2</b> What’s to come?</a></li>
</ul></li>
</ul></li>
<li class="chapter" data-level="6" data-path="6-multiple-regression.html"><a href="6-multiple-regression.html"><i class="fa fa-check"></i><b>6</b> Multiple Regression</a>
<ul>
<li class="chapter" data-level="" data-path="6-multiple-regression.html"><a href="6-multiple-regression.html#mult-reg-packages"><i class="fa fa-check"></i>Needed packages</a></li>
<li class="chapter" data-level="6.1" data-path="6-multiple-regression.html"><a href="6-multiple-regression.html#model4"><i class="fa fa-check"></i><b>6.1</b> One numerical and one categorical explanatory variable</a>
<ul>
<li class="chapter" data-level="6.1.1" data-path="6-multiple-regression.html"><a href="6-multiple-regression.html#model4EDA"><i class="fa fa-check"></i><b>6.1.1</b> Exploratory data analysis</a></li>
<li class="chapter" data-level="6.1.2" data-path="6-multiple-regression.html"><a href="6-multiple-regression.html#model4interactiontable"><i class="fa fa-check"></i><b>6.1.2</b> Interaction model</a></li>
<li class="chapter" data-level="6.1.3" data-path="6-multiple-regression.html"><a href="6-multiple-regression.html#model4table"><i class="fa fa-check"></i><b>6.1.3</b> Parallel slopes model</a></li>
<li class="chapter" data-level="6.1.4" data-path="6-multiple-regression.html"><a href="6-multiple-regression.html#model4points"><i class="fa fa-check"></i><b>6.1.4</b> Observed/fitted values and residuals</a></li>
</ul></li>
<li class="chapter" data-level="6.2" data-path="6-multiple-regression.html"><a href="6-multiple-regression.html#model3"><i class="fa fa-check"></i><b>6.2</b> Two categorical explanatory variables</a>
<ul>
<li class="chapter" data-level="6.2.1" data-path="6-multiple-regression.html"><a href="6-multiple-regression.html#model3EDA"><i class="fa fa-check"></i><b>6.2.1</b> Exploratory data analysis</a></li>
<li class="chapter" data-level="6.2.2" data-path="6-multiple-regression.html"><a href="6-multiple-regression.html#model3table"><i class="fa fa-check"></i><b>6.2.2</b> Regression lines</a></li>
<li class="chapter" data-level="6.2.3" data-path="6-multiple-regression.html"><a href="6-multiple-regression.html#model3points"><i class="fa fa-check"></i><b>6.2.3</b> Observed/fitted values and residuals</a></li>
</ul></li>
<li class="chapter" data-level="6.3" data-path="6-multiple-regression.html"><a href="6-multiple-regression.html#mult-reg-related-topics"><i class="fa fa-check"></i><b>6.3</b> Related topics</a>
<ul>
<li class="chapter" data-level="6.3.1" data-path="6-multiple-regression.html"><a href="6-multiple-regression.html#model-selection"><i class="fa fa-check"></i><b>6.3.1</b> Model selection using visualizations</a></li>
<li class="chapter" data-level="6.3.2" data-path="6-multiple-regression.html"><a href="6-multiple-regression.html#rsquared"><i class="fa fa-check"></i><b>6.3.2</b> Model selection using R-squared</a></li>
</ul></li>
<li class="chapter" data-level="6.4" data-path="6-multiple-regression.html"><a href="6-multiple-regression.html#mult-reg-conclusion"><i class="fa fa-check"></i><b>6.4</b> Conclusion</a>
<ul>
<li class="chapter" data-level="6.4.1" data-path="6-multiple-regression.html"><a href="6-multiple-regression.html#additional-resources-4"><i class="fa fa-check"></i><b>6.4.1</b> Additional resources</a></li>
<li class="chapter" data-level="6.4.2" data-path="6-multiple-regression.html"><a href="6-multiple-regression.html#whats-to-come-5"><i class="fa fa-check"></i><b>6.4.2</b> What’s to come?</a></li>
</ul></li>
</ul></li>
<li class="part"><span><b>III Statistical Inference with infer</b></span></li>
<li class="chapter" data-level="7" data-path="7-sampling.html"><a href="7-sampling.html"><i class="fa fa-check"></i><b>7</b> Sampling</a>
<ul>
<li class="chapter" data-level="" data-path="7-sampling.html"><a href="7-sampling.html#sampling-packages"><i class="fa fa-check"></i>Needed packages</a></li>
<li class="chapter" data-level="7.1" data-path="7-sampling.html"><a href="7-sampling.html#sampling-activity"><i class="fa fa-check"></i><b>7.1</b> Sampling bowl activity</a>
<ul>
<li class="chapter" data-level="7.1.1" data-path="7-sampling.html"><a href="7-sampling.html#what-proportion-of-this-bowls-balls-are-red"><i class="fa fa-check"></i><b>7.1.1</b> What proportion of this bowl’s balls are red?</a></li>
<li class="chapter" data-level="7.1.2" data-path="7-sampling.html"><a href="7-sampling.html#using-the-shovel-once"><i class="fa fa-check"></i><b>7.1.2</b> Using the shovel once</a></li>
<li class="chapter" data-level="7.1.3" data-path="7-sampling.html"><a href="7-sampling.html#student-shovels"><i class="fa fa-check"></i><b>7.1.3</b> Using the shovel 33 times</a></li>
<li class="chapter" data-level="7.1.4" data-path="7-sampling.html"><a href="7-sampling.html#sampling-what-did-we-just-do"><i class="fa fa-check"></i><b>7.1.4</b> What did we just do?</a></li>
</ul></li>
<li class="chapter" data-level="7.2" data-path="7-sampling.html"><a href="7-sampling.html#sampling-simulation"><i class="fa fa-check"></i><b>7.2</b> Virtual sampling</a>
<ul>
<li class="chapter" data-level="7.2.1" data-path="7-sampling.html"><a href="7-sampling.html#using-the-virtual-shovel-once"><i class="fa fa-check"></i><b>7.2.1</b> Using the virtual shovel once</a></li>
</ul></li>
<li class="chapter" data-level="7.3" data-path="7-sampling.html"><a href="7-sampling.html#sampling-framework"><i class="fa fa-check"></i><b>7.3</b> Sampling framework</a>
<ul>
<li class="chapter" data-level="7.3.1" data-path="7-sampling.html"><a href="7-sampling.html#terminology-and-notation"><i class="fa fa-check"></i><b>7.3.1</b> Terminology and notation</a></li>
<li class="chapter" data-level="7.3.2" data-path="7-sampling.html"><a href="7-sampling.html#sampling-definitions"><i class="fa fa-check"></i><b>7.3.2</b> Statistical definitions</a></li>
<li class="chapter" data-level="7.3.3" data-path="7-sampling.html"><a href="7-sampling.html#moral-of-the-story"><i class="fa fa-check"></i><b>7.3.3</b> The moral of the story</a></li>
</ul></li>
<li class="chapter" data-level="7.4" data-path="7-sampling.html"><a href="7-sampling.html#sampling-case-study"><i class="fa fa-check"></i><b>7.4</b> Case study: Polls</a></li>
<li class="chapter" data-level="7.5" data-path="7-sampling.html"><a href="7-sampling.html#sampling-conclusion-central-limit-theorem"><i class="fa fa-check"></i><b>7.5</b> Central Limit Theorem</a></li>
<li class="chapter" data-level="7.6" data-path="7-sampling.html"><a href="7-sampling.html#sampling-conclusion"><i class="fa fa-check"></i><b>7.6</b> Conclusion</a>
<ul>
<li class="chapter" data-level="7.6.1" data-path="7-sampling.html"><a href="7-sampling.html#sampling-conclusion-table"><i class="fa fa-check"></i><b>7.6.1</b> Sampling scenarios</a></li>
<li class="chapter" data-level="7.6.2" data-path="7-sampling.html"><a href="7-sampling.html#additional-resources-5"><i class="fa fa-check"></i><b>7.6.2</b> Additional resources</a></li>
<li class="chapter" data-level="7.6.3" data-path="7-sampling.html"><a href="7-sampling.html#whats-to-come-6"><i class="fa fa-check"></i><b>7.6.3</b> What’s to come?</a></li>
</ul></li>
</ul></li>
<li class="chapter" data-level="8" data-path="8-confidence-intervals.html"><a href="8-confidence-intervals.html"><i class="fa fa-check"></i><b>8</b> Bootstrapping and Confidence Intervals</a>
<ul>
<li class="chapter" data-level="" data-path="8-confidence-intervals.html"><a href="8-confidence-intervals.html#CI-packages"><i class="fa fa-check"></i>Needed packages</a></li>
<li class="chapter" data-level="8.1" data-path="8-confidence-intervals.html"><a href="8-confidence-intervals.html#resampling-tactile"><i class="fa fa-check"></i><b>8.1</b> Pennies activity</a>
<ul>
<li class="chapter" data-level="8.1.1" data-path="8-confidence-intervals.html"><a href="8-confidence-intervals.html#what-is-the-average-year-on-us-pennies-in-2019"><i class="fa fa-check"></i><b>8.1.1</b> What is the average year on US pennies in 2019?</a></li>
<li class="chapter" data-level="8.1.2" data-path="8-confidence-intervals.html"><a href="8-confidence-intervals.html#resampling-once"><i class="fa fa-check"></i><b>8.1.2</b> Resampling once</a></li>
<li class="chapter" data-level="8.1.3" data-path="8-confidence-intervals.html"><a href="8-confidence-intervals.html#student-resamples"><i class="fa fa-check"></i><b>8.1.3</b> Resampling 35 times</a></li>
<li class="chapter" data-level="8.1.4" data-path="8-confidence-intervals.html"><a href="8-confidence-intervals.html#ci-what-did-we-just-do"><i class="fa fa-check"></i><b>8.1.4</b> What did we just do?</a></li>
</ul></li>
<li class="chapter" data-level="8.2" data-path="8-confidence-intervals.html"><a href="8-confidence-intervals.html#resampling-simulation"><i class="fa fa-check"></i><b>8.2</b> Computer simulation of resampling</a>
<ul>
<li class="chapter" data-level="8.2.1" data-path="8-confidence-intervals.html"><a href="8-confidence-intervals.html#virtually-resampling-once"><i class="fa fa-check"></i><b>8.2.1</b> Virtually resampling once</a></li>
<li class="chapter" data-level="8.2.2" data-path="8-confidence-intervals.html"><a href="8-confidence-intervals.html#bootstrap-35-replicates"><i class="fa fa-check"></i><b>8.2.2</b> Virtually resampling 35 times</a></li>
<li class="chapter" data-level="8.2.3" data-path="8-confidence-intervals.html"><a href="8-confidence-intervals.html#bootstrap-1000-replicates"><i class="fa fa-check"></i><b>8.2.3</b> Virtually resampling 1000 times</a></li>
</ul></li>
<li class="chapter" data-level="8.3" data-path="8-confidence-intervals.html"><a href="8-confidence-intervals.html#ci-build-up"><i class="fa fa-check"></i><b>8.3</b> Understanding confidence intervals</a>
<ul>
<li class="chapter" data-level="8.3.1" data-path="8-confidence-intervals.html"><a href="8-confidence-intervals.html#percentile-method"><i class="fa fa-check"></i><b>8.3.1</b> Percentile method</a></li>
<li class="chapter" data-level="8.3.2" data-path="8-confidence-intervals.html"><a href="8-confidence-intervals.html#se-method"><i class="fa fa-check"></i><b>8.3.2</b> Standard error method</a></li>
</ul></li>
<li class="chapter" data-level="8.4" data-path="8-confidence-intervals.html"><a href="8-confidence-intervals.html#bootstrap-process"><i class="fa fa-check"></i><b>8.4</b> Constructing confidence intervals</a>
<ul>
<li class="chapter" data-level="8.4.1" data-path="8-confidence-intervals.html"><a href="8-confidence-intervals.html#original-workflow"><i class="fa fa-check"></i><b>8.4.1</b> Original workflow</a></li>
<li class="chapter" data-level="8.4.2" data-path="8-confidence-intervals.html"><a href="8-confidence-intervals.html#infer-workflow"><i class="fa fa-check"></i><b>8.4.2</b> <code>infer</code> package workflow</a></li>
<li class="chapter" data-level="8.4.3" data-path="8-confidence-intervals.html"><a href="8-confidence-intervals.html#percentile-method-infer"><i class="fa fa-check"></i><b>8.4.3</b> Percentile method with <code>infer</code></a></li>
<li class="chapter" data-level="8.4.4" data-path="8-confidence-intervals.html"><a href="8-confidence-intervals.html#infer-se"><i class="fa fa-check"></i><b>8.4.4</b> Standard error method with <code>infer</code></a></li>
</ul></li>
<li class="chapter" data-level="8.5" data-path="8-confidence-intervals.html"><a href="8-confidence-intervals.html#one-prop-ci"><i class="fa fa-check"></i><b>8.5</b> Interpreting confidence intervals</a>
<ul>
<li class="chapter" data-level="8.5.1" data-path="8-confidence-intervals.html"><a href="8-confidence-intervals.html#ilyas-yohan"><i class="fa fa-check"></i><b>8.5.1</b> Did the net capture the fish?</a></li>
<li class="chapter" data-level="8.5.2" data-path="8-confidence-intervals.html"><a href="8-confidence-intervals.html#shorthand"><i class="fa fa-check"></i><b>8.5.2</b> Precise and shorthand interpretation</a></li>
<li class="chapter" data-level="8.5.3" data-path="8-confidence-intervals.html"><a href="8-confidence-intervals.html#ci-width"><i class="fa fa-check"></i><b>8.5.3</b> Width of confidence intervals</a></li>
</ul></li>
<li class="chapter" data-level="8.6" data-path="8-confidence-intervals.html"><a href="8-confidence-intervals.html#case-study-two-prop-ci"><i class="fa fa-check"></i><b>8.6</b> Case study: Is yawning contagious?</a>
<ul>
<li class="chapter" data-level="8.6.1" data-path="8-confidence-intervals.html"><a href="8-confidence-intervals.html#mythbusters-study-data"><i class="fa fa-check"></i><b>8.6.1</b> <em>Mythbusters</em> study data</a></li>
<li class="chapter" data-level="8.6.2" data-path="8-confidence-intervals.html"><a href="8-confidence-intervals.html#sampling-scenario"><i class="fa fa-check"></i><b>8.6.2</b> Sampling scenario</a></li>
<li class="chapter" data-level="8.6.3" data-path="8-confidence-intervals.html"><a href="8-confidence-intervals.html#ci-build"><i class="fa fa-check"></i><b>8.6.3</b> Constructing the confidence interval</a></li>
<li class="chapter" data-level="8.6.4" data-path="8-confidence-intervals.html"><a href="8-confidence-intervals.html#interpreting-the-confidence-interval"><i class="fa fa-check"></i><b>8.6.4</b> Interpreting the confidence interval</a></li>
</ul></li>
<li class="chapter" data-level="8.7" data-path="8-confidence-intervals.html"><a href="8-confidence-intervals.html#ci-conclusion"><i class="fa fa-check"></i><b>8.7</b> Conclusion</a>
<ul>
<li class="chapter" data-level="8.7.1" data-path="8-confidence-intervals.html"><a href="8-confidence-intervals.html#bootstrap-vs-sampling"><i class="fa fa-check"></i><b>8.7.1</b> Comparing bootstrap and sampling distributions</a></li>
<li class="chapter" data-level="8.7.2" data-path="8-confidence-intervals.html"><a href="8-confidence-intervals.html#theory-ci"><i class="fa fa-check"></i><b>8.7.2</b> Theory-based confidence intervals</a></li>
<li class="chapter" data-level="8.7.3" data-path="8-confidence-intervals.html"><a href="8-confidence-intervals.html#additional-resources-6"><i class="fa fa-check"></i><b>8.7.3</b> Additional resources</a></li>
<li class="chapter" data-level="8.7.4" data-path="8-confidence-intervals.html"><a href="8-confidence-intervals.html#whats-to-come-7"><i class="fa fa-check"></i><b>8.7.4</b> What’s to come?</a></li>
</ul></li>
</ul></li>
<li class="chapter" data-level="9" data-path="9-hypothesis-testing.html"><a href="9-hypothesis-testing.html"><i class="fa fa-check"></i><b>9</b> Hypothesis Testing</a>
<ul>
<li class="chapter" data-level="" data-path="9-hypothesis-testing.html"><a href="9-hypothesis-testing.html#nhst-packages"><i class="fa fa-check"></i>Needed packages</a></li>
<li class="chapter" data-level="9.1" data-path="9-hypothesis-testing.html"><a href="9-hypothesis-testing.html#ht-activity"><i class="fa fa-check"></i><b>9.1</b> Promotions activity</a>
<ul>
<li class="chapter" data-level="9.1.1" data-path="9-hypothesis-testing.html"><a href="9-hypothesis-testing.html#does-gender-affect-promotions-at-a-bank"><i class="fa fa-check"></i><b>9.1.1</b> Does gender affect promotions at a bank?</a></li>
<li class="chapter" data-level="9.1.2" data-path="9-hypothesis-testing.html"><a href="9-hypothesis-testing.html#shuffling-once"><i class="fa fa-check"></i><b>9.1.2</b> Shuffling once</a></li>
<li class="chapter" data-level="9.1.3" data-path="9-hypothesis-testing.html"><a href="9-hypothesis-testing.html#shuffling-16-times"><i class="fa fa-check"></i><b>9.1.3</b> Shuffling 16 times</a></li>
<li class="chapter" data-level="9.1.4" data-path="9-hypothesis-testing.html"><a href="9-hypothesis-testing.html#ht-what-did-we-just-do"><i class="fa fa-check"></i><b>9.1.4</b> What did we just do?</a></li>
</ul></li>
<li class="chapter" data-level="9.2" data-path="9-hypothesis-testing.html"><a href="9-hypothesis-testing.html#understanding-ht"><i class="fa fa-check"></i><b>9.2</b> Understanding hypothesis tests</a></li>
<li class="chapter" data-level="9.3" data-path="9-hypothesis-testing.html"><a href="9-hypothesis-testing.html#ht-infer"><i class="fa fa-check"></i><b>9.3</b> Conducting hypothesis tests</a>
<ul>
<li class="chapter" data-level="9.3.1" data-path="9-hypothesis-testing.html"><a href="9-hypothesis-testing.html#infer-workflow-ht"><i class="fa fa-check"></i><b>9.3.1</b> <code>infer</code> package workflow</a></li>
<li class="chapter" data-level="9.3.2" data-path="9-hypothesis-testing.html"><a href="9-hypothesis-testing.html#comparing-infer-workflows"><i class="fa fa-check"></i><b>9.3.2</b> Comparison with confidence intervals</a></li>
<li class="chapter" data-level="9.3.3" data-path="9-hypothesis-testing.html"><a href="9-hypothesis-testing.html#only-one-test"><i class="fa fa-check"></i><b>9.3.3</b> “There is only one test”</a></li>
</ul></li>
<li class="chapter" data-level="9.4" data-path="9-hypothesis-testing.html"><a href="9-hypothesis-testing.html#ht-interpretation"><i class="fa fa-check"></i><b>9.4</b> Interpreting hypothesis tests</a>
<ul>
<li class="chapter" data-level="9.4.1" data-path="9-hypothesis-testing.html"><a href="9-hypothesis-testing.html#trial"><i class="fa fa-check"></i><b>9.4.1</b> Two possible outcomes</a></li>
<li class="chapter" data-level="9.4.2" data-path="9-hypothesis-testing.html"><a href="9-hypothesis-testing.html#types-of-errors"><i class="fa fa-check"></i><b>9.4.2</b> Types of errors</a></li>
<li class="chapter" data-level="9.4.3" data-path="9-hypothesis-testing.html"><a href="9-hypothesis-testing.html#choosing-alpha"><i class="fa fa-check"></i><b>9.4.3</b> How do we choose alpha?</a></li>
</ul></li>
<li class="chapter" data-level="9.5" data-path="9-hypothesis-testing.html"><a href="9-hypothesis-testing.html#ht-case-study"><i class="fa fa-check"></i><b>9.5</b> Case study: Are action or romance movies rated higher?</a>
<ul>
<li class="chapter" data-level="9.5.1" data-path="9-hypothesis-testing.html"><a href="9-hypothesis-testing.html#imdb-data"><i class="fa fa-check"></i><b>9.5.1</b> IMDb ratings data</a></li>
<li class="chapter" data-level="9.5.2" data-path="9-hypothesis-testing.html"><a href="9-hypothesis-testing.html#sampling-scenario-1"><i class="fa fa-check"></i><b>9.5.2</b> Sampling scenario</a></li>
<li class="chapter" data-level="9.5.3" data-path="9-hypothesis-testing.html"><a href="9-hypothesis-testing.html#conducting-the-hypothesis-test"><i class="fa fa-check"></i><b>9.5.3</b> Conducting the hypothesis test</a></li>
</ul></li>
<li class="chapter" data-level="9.6" data-path="9-hypothesis-testing.html"><a href="9-hypothesis-testing.html#nhst-conclusion"><i class="fa fa-check"></i><b>9.6</b> Conclusion</a>
<ul>
<li class="chapter" data-level="9.6.1" data-path="9-hypothesis-testing.html"><a href="9-hypothesis-testing.html#theory-hypo"><i class="fa fa-check"></i><b>9.6.1</b> Theory-based hypothesis tests</a></li>
<li class="chapter" data-level="9.6.2" data-path="9-hypothesis-testing.html"><a href="9-hypothesis-testing.html#when-inference-is-not-needed"><i class="fa fa-check"></i><b>9.6.2</b> When inference is not needed</a></li>
<li class="chapter" data-level="9.6.3" data-path="9-hypothesis-testing.html"><a href="9-hypothesis-testing.html#problems-with-p-values"><i class="fa fa-check"></i><b>9.6.3</b> Problems with p-values</a></li>
<li class="chapter" data-level="9.6.4" data-path="9-hypothesis-testing.html"><a href="9-hypothesis-testing.html#additional-resources-7"><i class="fa fa-check"></i><b>9.6.4</b> Additional resources</a></li>
<li class="chapter" data-level="9.6.5" data-path="9-hypothesis-testing.html"><a href="9-hypothesis-testing.html#whats-to-come-8"><i class="fa fa-check"></i><b>9.6.5</b> What’s to come</a></li>
</ul></li>
</ul></li>
<li class="chapter" data-level="10" data-path="10-inference-for-regression.html"><a href="10-inference-for-regression.html"><i class="fa fa-check"></i><b>10</b> Inference for Regression</a>
<ul>
<li class="chapter" data-level="" data-path="10-inference-for-regression.html"><a href="10-inference-for-regression.html#inf-packages"><i class="fa fa-check"></i>Needed packages</a></li>
<li class="chapter" data-level="10.1" data-path="10-inference-for-regression.html"><a href="10-inference-for-regression.html#regression-refresher"><i class="fa fa-check"></i><b>10.1</b> Regression refresher</a>
<ul>
<li class="chapter" data-level="10.1.1" data-path="10-inference-for-regression.html"><a href="10-inference-for-regression.html#teaching-evaluations-analysis"><i class="fa fa-check"></i><b>10.1.1</b> Teaching evaluations analysis</a></li>
<li class="chapter" data-level="10.1.2" data-path="10-inference-for-regression.html"><a href="10-inference-for-regression.html#sampling-scenario-2"><i class="fa fa-check"></i><b>10.1.2</b> Sampling scenario</a></li>
</ul></li>
<li class="chapter" data-level="10.2" data-path="10-inference-for-regression.html"><a href="10-inference-for-regression.html#regression-interp"><i class="fa fa-check"></i><b>10.2</b> Interpreting regression tables</a>
<ul>
<li class="chapter" data-level="10.2.1" data-path="10-inference-for-regression.html"><a href="10-inference-for-regression.html#regression-se"><i class="fa fa-check"></i><b>10.2.1</b> Standard error</a></li>
<li class="chapter" data-level="10.2.2" data-path="10-inference-for-regression.html"><a href="10-inference-for-regression.html#regression-test-statistic"><i class="fa fa-check"></i><b>10.2.2</b> Test statistic</a></li>
<li class="chapter" data-level="10.2.3" data-path="10-inference-for-regression.html"><a href="10-inference-for-regression.html#p-value"><i class="fa fa-check"></i><b>10.2.3</b> p-value</a></li>
<li class="chapter" data-level="10.2.4" data-path="10-inference-for-regression.html"><a href="10-inference-for-regression.html#confidence-interval"><i class="fa fa-check"></i><b>10.2.4</b> Confidence interval</a></li>
<li class="chapter" data-level="10.2.5" data-path="10-inference-for-regression.html"><a href="10-inference-for-regression.html#regression-table-computation"><i class="fa fa-check"></i><b>10.2.5</b> How does R compute the table?</a></li>
</ul></li>
<li class="chapter" data-level="10.3" data-path="10-inference-for-regression.html"><a href="10-inference-for-regression.html#regression-conditions"><i class="fa fa-check"></i><b>10.3</b> Conditions for inference for regression</a>
<ul>
<li class="chapter" data-level="10.3.1" data-path="10-inference-for-regression.html"><a href="10-inference-for-regression.html#residuals-refresher"><i class="fa fa-check"></i><b>10.3.1</b> Residuals refresher</a></li>
<li class="chapter" data-level="10.3.2" data-path="10-inference-for-regression.html"><a href="10-inference-for-regression.html#linearity-of-relationship"><i class="fa fa-check"></i><b>10.3.2</b> Linearity of relationship</a></li>
<li class="chapter" data-level="10.3.3" data-path="10-inference-for-regression.html"><a href="10-inference-for-regression.html#independence-of-residuals"><i class="fa fa-check"></i><b>10.3.3</b> Independence of residuals</a></li>
<li class="chapter" data-level="10.3.4" data-path="10-inference-for-regression.html"><a href="10-inference-for-regression.html#normality-of-residuals"><i class="fa fa-check"></i><b>10.3.4</b> Normality of residuals</a></li>
<li class="chapter" data-level="10.3.5" data-path="10-inference-for-regression.html"><a href="10-inference-for-regression.html#equality-of-variance"><i class="fa fa-check"></i><b>10.3.5</b> Equality of variance</a></li>
<li class="chapter" data-level="10.3.6" data-path="10-inference-for-regression.html"><a href="10-inference-for-regression.html#what-is-the-conclusion"><i class="fa fa-check"></i><b>10.3.6</b> What’s the conclusion?</a></li>
</ul></li>
<li class="chapter" data-level="10.4" data-path="10-inference-for-regression.html"><a href="10-inference-for-regression.html#infer-regression"><i class="fa fa-check"></i><b>10.4</b> Simulation-based inference for regression</a>
<ul>
<li class="chapter" data-level="10.4.1" data-path="10-inference-for-regression.html"><a href="10-inference-for-regression.html#confidence-interval-for-slope"><i class="fa fa-check"></i><b>10.4.1</b> Confidence interval for slope</a></li>
<li class="chapter" data-level="10.4.2" data-path="10-inference-for-regression.html"><a href="10-inference-for-regression.html#hypothesis-test-for-slope"><i class="fa fa-check"></i><b>10.4.2</b> Hypothesis test for slope</a></li>
</ul></li>
<li class="chapter" data-level="10.5" data-path="10-inference-for-regression.html"><a href="10-inference-for-regression.html#inference-conclusion"><i class="fa fa-check"></i><b>10.5</b> Conclusion</a>
<ul>
<li class="chapter" data-level="10.5.1" data-path="10-inference-for-regression.html"><a href="10-inference-for-regression.html#theory-regression"><i class="fa fa-check"></i><b>10.5.1</b> Theory-based inference for regression</a></li>
<li class="chapter" data-level="10.5.2" data-path="10-inference-for-regression.html"><a href="10-inference-for-regression.html#summary-of-statistical-inference"><i class="fa fa-check"></i><b>10.5.2</b> Summary of statistical inference</a></li>
<li class="chapter" data-level="10.5.3" data-path="10-inference-for-regression.html"><a href="10-inference-for-regression.html#additional-resources-8"><i class="fa fa-check"></i><b>10.5.3</b> Additional resources</a></li>
<li class="chapter" data-level="10.5.4" data-path="10-inference-for-regression.html"><a href="10-inference-for-regression.html#whats-to-come-9"><i class="fa fa-check"></i><b>10.5.4</b> What’s to come</a></li>
</ul></li>
</ul></li>
<li class="part"><span><b>IV Conclusion</b></span></li>
<li class="chapter" data-level="11" data-path="11-thinking-with-data.html"><a href="11-thinking-with-data.html"><i class="fa fa-check"></i><b>11</b> Tell Your Story with Data</a>
<ul>
<li class="chapter" data-level="11.1" data-path="11-thinking-with-data.html"><a href="11-thinking-with-data.html#review"><i class="fa fa-check"></i><b>11.1</b> Review</a>
<ul>
<li class="chapter" data-level="" data-path="11-thinking-with-data.html"><a href="11-thinking-with-data.html#story-packages"><i class="fa fa-check"></i>Needed packages</a></li>
</ul></li>
<li class="chapter" data-level="11.2" data-path="11-thinking-with-data.html"><a href="11-thinking-with-data.html#seattle-house-prices"><i class="fa fa-check"></i><b>11.2</b> Case study: Seattle house prices</a>
<ul>
<li class="chapter" data-level="11.2.1" data-path="11-thinking-with-data.html"><a href="11-thinking-with-data.html#house-prices-EDA-I"><i class="fa fa-check"></i><b>11.2.1</b> Exploratory data analysis: Part I</a></li>
<li class="chapter" data-level="11.2.2" data-path="11-thinking-with-data.html"><a href="11-thinking-with-data.html#house-prices-EDA-II"><i class="fa fa-check"></i><b>11.2.2</b> Exploratory data analysis: Part II</a></li>
<li class="chapter" data-level="11.2.3" data-path="11-thinking-with-data.html"><a href="11-thinking-with-data.html#house-prices-regression"><i class="fa fa-check"></i><b>11.2.3</b> Regression modeling</a></li>
<li class="chapter" data-level="11.2.4" data-path="11-thinking-with-data.html"><a href="11-thinking-with-data.html#house-prices-making-predictions"><i class="fa fa-check"></i><b>11.2.4</b> Making predictions</a></li>
</ul></li>
<li class="chapter" data-level="11.3" data-path="11-thinking-with-data.html"><a href="11-thinking-with-data.html#data-journalism"><i class="fa fa-check"></i><b>11.3</b> Case study: Effective data storytelling</a>
<ul>
<li class="chapter" data-level="11.3.1" data-path="11-thinking-with-data.html"><a href="11-thinking-with-data.html#bechdel-test-for-hollywood-gender-representation"><i class="fa fa-check"></i><b>11.3.1</b> Bechdel test for Hollywood gender representation</a></li>
<li class="chapter" data-level="11.3.2" data-path="11-thinking-with-data.html"><a href="11-thinking-with-data.html#us-births-in-1999"><i class="fa fa-check"></i><b>11.3.2</b> US Births in 1999</a></li>
<li class="chapter" data-level="11.3.3" data-path="11-thinking-with-data.html"><a href="11-thinking-with-data.html#scripts-of-r-code"><i class="fa fa-check"></i><b>11.3.3</b> Scripts of R code</a></li>
</ul></li>
<li class="chapter" data-level="" data-path="11-thinking-with-data.html"><a href="11-thinking-with-data.html#concluding-remarks"><i class="fa fa-check"></i>Concluding remarks</a></li>
</ul></li>
<li class="appendix"><span><b>Appendix</b></span></li>
<li class="chapter" data-level="A" data-path="A-appendixA.html"><a href="A-appendixA.html"><i class="fa fa-check"></i><b>A</b> Statistical Background</a>
<ul>
<li class="chapter" data-level="A.1" data-path="A-appendixA.html"><a href="A-appendixA.html#appendix-stat-terms"><i class="fa fa-check"></i><b>A.1</b> Basic statistical terms</a>
<ul>
<li class="chapter" data-level="A.1.1" data-path="A-appendixA.html"><a href="A-appendixA.html#mean"><i class="fa fa-check"></i><b>A.1.1</b> Mean</a></li>
<li class="chapter" data-level="A.1.2" data-path="A-appendixA.html"><a href="A-appendixA.html#median"><i class="fa fa-check"></i><b>A.1.2</b> Median</a></li>
<li class="chapter" data-level="A.1.3" data-path="A-appendixA.html"><a href="A-appendixA.html#appendix-sd-variance"><i class="fa fa-check"></i><b>A.1.3</b> Standard deviation and variance</a></li>
<li class="chapter" data-level="A.1.4" data-path="A-appendixA.html"><a href="A-appendixA.html#five-number-summary"><i class="fa fa-check"></i><b>A.1.4</b> Five-number summary</a></li>
<li class="chapter" data-level="A.1.5" data-path="A-appendixA.html"><a href="A-appendixA.html#distribution"><i class="fa fa-check"></i><b>A.1.5</b> Distribution</a></li>
<li class="chapter" data-level="A.1.6" data-path="A-appendixA.html"><a href="A-appendixA.html#outliers"><i class="fa fa-check"></i><b>A.1.6</b> Outliers</a></li>
</ul></li>
<li class="chapter" data-level="A.2" data-path="A-appendixA.html"><a href="A-appendixA.html#appendix-normal-curve"><i class="fa fa-check"></i><b>A.2</b> Normal distribution</a></li>
<li class="chapter" data-level="A.3" data-path="A-appendixA.html"><a href="A-appendixA.html#appendix-log10-transformations"><i class="fa fa-check"></i><b>A.3</b> log10 transformations</a></li>
</ul></li>
<li class="chapter" data-level="B" data-path="B-appendixB.html"><a href="B-appendixB.html"><i class="fa fa-check"></i><b>B</b> Inference Examples</a>
<ul>
<li class="chapter" data-level="" data-path="B-appendixB.html"><a href="B-appendixB.html#needed-packages-1"><i class="fa fa-check"></i>Needed packages</a></li>
<li class="chapter" data-level="B.1" data-path="B-appendixB.html"><a href="B-appendixB.html#inference-mind-map"><i class="fa fa-check"></i><b>B.1</b> Inference mind map</a></li>
<li class="chapter" data-level="B.2" data-path="B-appendixB.html"><a href="B-appendixB.html#one-mean"><i class="fa fa-check"></i><b>B.2</b> One mean</a>
<ul>
<li class="chapter" data-level="B.2.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement"><i class="fa fa-check"></i><b>B.2.1</b> Problem statement</a></li>
<li class="chapter" data-level="B.2.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses"><i class="fa fa-check"></i><b>B.2.2</b> Competing hypotheses</a></li>
<li class="chapter" data-level="B.2.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data"><i class="fa fa-check"></i><b>B.2.3</b> Exploring the sample data</a></li>
<li class="chapter" data-level="B.2.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods"><i class="fa fa-check"></i><b>B.2.4</b> Non-traditional methods</a></li>
<li class="chapter" data-level="B.2.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods"><i class="fa fa-check"></i><b>B.2.5</b> Traditional methods</a></li>
<li class="chapter" data-level="B.2.6" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results"><i class="fa fa-check"></i><b>B.2.6</b> Comparing results</a></li>
</ul></li>
<li class="chapter" data-level="B.3" data-path="B-appendixB.html"><a href="B-appendixB.html#one-proportion"><i class="fa fa-check"></i><b>B.3</b> One proportion</a>
<ul>
<li class="chapter" data-level="B.3.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement-1"><i class="fa fa-check"></i><b>B.3.1</b> Problem statement</a></li>
<li class="chapter" data-level="B.3.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-1"><i class="fa fa-check"></i><b>B.3.2</b> Competing hypotheses</a></li>
<li class="chapter" data-level="B.3.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-1"><i class="fa fa-check"></i><b>B.3.3</b> Exploring the sample data</a></li>
<li class="chapter" data-level="B.3.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-1"><i class="fa fa-check"></i><b>B.3.4</b> Non-traditional methods</a></li>
<li class="chapter" data-level="B.3.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-1"><i class="fa fa-check"></i><b>B.3.5</b> Traditional methods</a></li>
<li class="chapter" data-level="B.3.6" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-1"><i class="fa fa-check"></i><b>B.3.6</b> Comparing results</a></li>
</ul></li>
<li class="chapter" data-level="B.4" data-path="B-appendixB.html"><a href="B-appendixB.html#two-proportions"><i class="fa fa-check"></i><b>B.4</b> Two proportions</a>
<ul>
<li class="chapter" data-level="B.4.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement-2"><i class="fa fa-check"></i><b>B.4.1</b> Problem statement</a></li>
<li class="chapter" data-level="B.4.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-2"><i class="fa fa-check"></i><b>B.4.2</b> Competing hypotheses</a></li>
<li class="chapter" data-level="B.4.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-2"><i class="fa fa-check"></i><b>B.4.3</b> Exploring the sample data</a></li>
<li class="chapter" data-level="B.4.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-2"><i class="fa fa-check"></i><b>B.4.4</b> Non-traditional methods</a></li>
<li class="chapter" data-level="B.4.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-2"><i class="fa fa-check"></i><b>B.4.5</b> Traditional methods</a></li>
<li class="chapter" data-level="B.4.6" data-path="B-appendixB.html"><a href="B-appendixB.html#test-statistic-2"><i class="fa fa-check"></i><b>B.4.6</b> Test statistic</a></li>
<li class="chapter" data-level="B.4.7" data-path="B-appendixB.html"><a href="B-appendixB.html#state-conclusion-2"><i class="fa fa-check"></i><b>B.4.7</b> State conclusion</a></li>
<li class="chapter" data-level="B.4.8" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-2"><i class="fa fa-check"></i><b>B.4.8</b> Comparing results</a></li>
</ul></li>
<li class="chapter" data-level="B.5" data-path="B-appendixB.html"><a href="B-appendixB.html#two-means-independent-samples"><i class="fa fa-check"></i><b>B.5</b> Two means (independent samples)</a>
<ul>
<li class="chapter" data-level="B.5.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement-3"><i class="fa fa-check"></i><b>B.5.1</b> Problem statement</a></li>
<li class="chapter" data-level="B.5.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-3"><i class="fa fa-check"></i><b>B.5.2</b> Competing hypotheses</a></li>
<li class="chapter" data-level="B.5.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-3"><i class="fa fa-check"></i><b>B.5.3</b> Exploring the sample data</a></li>
<li class="chapter" data-level="B.5.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-3"><i class="fa fa-check"></i><b>B.5.4</b> Non-traditional methods</a></li>
<li class="chapter" data-level="B.5.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-3"><i class="fa fa-check"></i><b>B.5.5</b> Traditional methods</a></li>
<li class="chapter" data-level="B.5.6" data-path="B-appendixB.html"><a href="B-appendixB.html#test-statistic-3"><i class="fa fa-check"></i><b>B.5.6</b> Test statistic</a></li>
<li class="chapter" data-level="B.5.7" data-path="B-appendixB.html"><a href="B-appendixB.html#compute-p-value-1"><i class="fa fa-check"></i><b>B.5.7</b> Compute <span class="math inline">\(p\)</span>-value</a></li>
<li class="chapter" data-level="B.5.8" data-path="B-appendixB.html"><a href="B-appendixB.html#state-conclusion-3"><i class="fa fa-check"></i><b>B.5.8</b> State conclusion</a></li>
<li class="chapter" data-level="B.5.9" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-3"><i class="fa fa-check"></i><b>B.5.9</b> Comparing results</a></li>
</ul></li>
<li class="chapter" data-level="B.6" data-path="B-appendixB.html"><a href="B-appendixB.html#two-means-paired-samples"><i class="fa fa-check"></i><b>B.6</b> Two means (paired samples)</a>
<ul>
<li class="chapter" data-level="" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement-4"><i class="fa fa-check"></i>Problem statement</a></li>
<li class="chapter" data-level="B.6.1" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-4"><i class="fa fa-check"></i><b>B.6.1</b> Competing hypotheses</a></li>
<li class="chapter" data-level="B.6.2" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-4"><i class="fa fa-check"></i><b>B.6.2</b> Exploring the sample data</a></li>
<li class="chapter" data-level="B.6.3" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-4"><i class="fa fa-check"></i><b>B.6.3</b> Non-traditional methods</a></li>
<li class="chapter" data-level="B.6.4" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-4"><i class="fa fa-check"></i><b>B.6.4</b> Traditional methods</a></li>
<li class="chapter" data-level="B.6.5" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-4"><i class="fa fa-check"></i><b>B.6.5</b> Comparing results</a></li>
</ul></li>
</ul></li>
<li class="chapter" data-level="C" data-path="C-appendixC.html"><a href="C-appendixC.html"><i class="fa fa-check"></i><b>C</b> Tips and Tricks</a>
<ul>
<li class="chapter" data-level="" data-path="C-appendixC.html"><a href="C-appendixC.html#needed-packages-2"><i class="fa fa-check"></i>Needed packages</a></li>
<li class="chapter" data-level="C.1" data-path="C-appendixC.html"><a href="C-appendixC.html#data-wrangling"><i class="fa fa-check"></i><b>C.1</b> Data wrangling</a>
<ul>
<li class="chapter" data-level="C.1.1" data-path="C-appendixC.html"><a href="C-appendixC.html#appendix-missing-values"><i class="fa fa-check"></i><b>C.1.1</b> Dealing with missing values</a></li>
<li class="chapter" data-level="C.1.2" data-path="C-appendixC.html"><a href="C-appendixC.html#appendix-reordering-bars"><i class="fa fa-check"></i><b>C.1.2</b> Reordering bars in a barplot</a></li>
<li class="chapter" data-level="C.1.3" data-path="C-appendixC.html"><a href="C-appendixC.html#appendix-money-on-axis"><i class="fa fa-check"></i><b>C.1.3</b> Showing money on an axis</a></li>
<li class="chapter" data-level="C.1.4" data-path="C-appendixC.html"><a href="C-appendixC.html#appendix-changing-values"><i class="fa fa-check"></i><b>C.1.4</b> Changing values inside cells</a></li>
<li class="chapter" data-level="C.1.5" data-path="C-appendixC.html"><a href="C-appendixC.html#appendix-convert-numerical-categorical"><i class="fa fa-check"></i><b>C.1.5</b> Converting a numerical variable to a categorical one</a></li>
<li class="chapter" data-level="C.1.6" data-path="C-appendixC.html"><a href="C-appendixC.html#appendix-prop"><i class="fa fa-check"></i><b>C.1.6</b> Computing proportions</a></li>
<li class="chapter" data-level="C.1.7" data-path="C-appendixC.html"><a href="C-appendixC.html#appendix-commas"><i class="fa fa-check"></i><b>C.1.7</b> Dealing with %, commas, and $</a></li>
</ul></li>
<li class="chapter" data-level="C.2" data-path="C-appendixC.html"><a href="C-appendixC.html#interactive-graphics"><i class="fa fa-check"></i><b>C.2</b> Interactive graphics</a>
<ul>
<li class="chapter" data-level="C.2.1" data-path="C-appendixC.html"><a href="C-appendixC.html#interactive-linegraphs"><i class="fa fa-check"></i><b>C.2.1</b> Interactive linegraphs</a></li>
</ul></li>
</ul></li>
<li class="chapter" data-level="D" data-path="D-appendixD.html"><a href="D-appendixD.html"><i class="fa fa-check"></i><b>D</b> Learning Check Solutions</a>
<ul>
<li class="chapter" data-level="D.1" data-path="D-appendixD.html"><a href="D-appendixD.html#chapter-1-solutions"><i class="fa fa-check"></i><b>D.1</b> Chapter 1 Solutions</a></li>
</ul></li>
<li class="chapter" data-level="E" data-path="E-appendixE.html"><a href="E-appendixE.html"><i class="fa fa-check"></i><b>E</b> Versions of R Packages Used</a></li>
<li class="chapter" data-level="" data-path="references.html"><a href="references.html"><i class="fa fa-check"></i>References</a></li>
</ul>
</nav>
</div>
<div class="book-body">
<div class="body-inner">
<div class="book-header" role="navigation">
<h1>
<i class="fa fa-circle-o-notch fa-spin"></i><a href="./">Statistical Inference via Data Science</a>
</h1>
</div>
<div class="page-wrapper" tabindex="-1" role="main">
<div class="page-inner">
<section class="normal" id="section-">
<html>
<img src='https://moderndive.com/wide_format.png' alt="ModernDive">
</html>
<div id="thinking-with-data" class="section level1" number="11">
<h1><span class="header-section-number">Chapter 11</span> Tell Your Story with Data</h1>
<p>Recall in the Preface and at the end of chapters throughout this book, we displayed the “<em>ModernDive</em> flowchart” mapping your journey through this book.</p>
<div class="figure" style="text-align: center"><span id="fig:moderndive-figure-conclusion"></span>
<img src="images/flowcharts/flowchart/flowchart.002.png" alt="ModernDive flowchart." width="100%" height="100%" />
<p class="caption">
FIGURE 11.1: <em>ModernDive</em> flowchart.
</p>
</div>
<div id="review" class="section level2" number="11.1">
<h2><span class="header-section-number">11.1</span> Review</h2>
<p>Let’s go over a refresher of what you’ve covered so far. You first got started with data in Chapter <a href="1-getting-started.html#getting-started">1</a> where you learned about the difference between R and RStudio, started coding in R, installed and loaded your first R packages, and explored your first dataset: all domestic departure <code>flights</code> from a major New York City airport in 2013. Then you covered the following three parts of this book (Parts 2 and 4 are combined into a single portion):</p>
<ol style="list-style-type: decimal">
<li>Data science with <code>tidyverse</code>. You assembled your data science toolbox using <code>tidyverse</code> packages. In particular, you
<ul>
<li>Ch.<a href="2-viz.html#viz">2</a>: Visualized data using the <code>ggplot2</code> package.</li>
<li>Ch.<a href="3-wrangling.html#wrangling">3</a>: Wrangled data using the <code>dplyr</code> package.</li>
<li>Ch.<a href="4-tidy.html#tidy">4</a>: Learned about the concept of “tidy” data as a standardized data frame input and output format for all packages in the <code>tidyverse</code>. Furthermore, you learned how to import spreadsheet files into R using the <code>readr</code> package.</li>
</ul></li>
<li>Data modeling with <code>moderndive</code>. Using these data science tools and helper functions from the <code>moderndive</code> package, you fit your first data models. In particular, you
<ul>
<li>Ch.<a href="5-regression.html#regression">5</a>: Discovered basic regression models with only one explanatory variable.</li>
<li>Ch.<a href="6-multiple-regression.html#multiple-regression">6</a>: Examined multiple regression models with more than one explanatory variable.</li>
</ul></li>
<li>Statistical inference with <code>infer</code>. Once again using your newly acquired data science tools, you unpacked statistical inference using the <code>infer</code> package. In particular, you
<ul>
<li>Ch.<a href="7-sampling.html#sampling">7</a>: Learned about the role that sampling variability plays in statistical inference and the role that sample size plays in this sampling variability.</li>
<li>Ch.<a href="8-confidence-intervals.html#confidence-intervals">8</a>: Constructed confidence intervals using bootstrapping.</li>
<li>Ch.<a href="9-hypothesis-testing.html#hypothesis-testing">9</a>: Conducted hypothesis tests using permutation.</li>
</ul></li>
<li>Data modeling with <code>moderndive</code> (revisited): Armed with your understanding of statistical inference, you revisited and reviewed the models you constructed in Ch.<a href="5-regression.html#regression">5</a> and Ch.<a href="6-multiple-regression.html#multiple-regression">6</a>. In particular, you
<ul>
<li>Ch.<a href="10-inference-for-regression.html#inference-for-regression">10</a>: Interpreted confidence intervals and hypothesis tests in a regression setting.</li>
</ul></li>
</ol>
<p>We’ve guided you through your first experiences of <a href="https://arxiv.org/pdf/1410.3127.pdf">“thinking with data,”</a> an expression originally coined by Dr. Diane Lambert. The philosophy underlying this expression guided your path in the flowchart in Figure <a href="11-thinking-with-data.html#fig:moderndive-figure-conclusion">11.1</a>.</p>
<p>This philosophy is also well-summarized in <a href="https://peerj.com/collections/50-practicaldatascistats/">“Practical Data Science for Stats”</a>: a collection of pre-prints focusing on the practical side of data science workflows and statistical analysis curated by <a href="https://twitter.com/jennybryan">Dr. Jennifer Bryan</a> and <a href="https://twitter.com/hadleywickham">Dr. Hadley Wickham</a>. They quote:</p>
<blockquote>
<p>There are many aspects of day-to-day analytical work that are almost absent from the conventional statistics literature and curriculum. And yet these activities account for a considerable share of the time and effort of data analysts and applied statisticians. The goal of this collection is to increase the visibility and adoption of modern data analytical workflows. We aim to facilitate the transfer of tools and frameworks between industry and academia, between software engineering and statistics and computer science, and across different domains.</p>
</blockquote>
<p>In other words, to be equipped to “think with data” in the 21st century, analysts need practice going through the <a href="http://r4ds.had.co.nz/explore-intro.html">“data/science pipeline”</a> we saw in the Preface (re-displayed in Figure <a href="11-thinking-with-data.html#fig:pipeline-figure-conclusion">11.2</a>). It is our opinion that, for too long, statistics education has only focused on parts of this pipeline, instead of going through it in its <em>entirety</em>.</p>
<div class="figure" style="text-align: center"><span id="fig:pipeline-figure-conclusion"></span>
<img src="images/r4ds/data_science_pipeline.png" alt="Data/science pipeline." width="70%" height="70%" />
<p class="caption">
FIGURE 11.2: Data/science pipeline.
</p>
</div>
<p>To conclude this book, we’ll present you with some additional case studies of working with data. In Section <a href="11-thinking-with-data.html#seattle-house-prices">11.2</a> we’ll take you through a full-pass of the “Data/Science Pipeline” in order to analyze the sale price of houses in Seattle, WA, USA. In Section <a href="11-thinking-with-data.html#data-journalism">11.3</a>, we’ll present you with some examples of effective data storytelling drawn from the data journalism website, <a href="https://fivethirtyeight.com/">FiveThirtyEight.com</a>. We present these case studies to you because we believe that you should not only be able to “think with data,” but also be able to “tell your story with data.” Let’s explore how to do this!</p>
<div id="story-packages" class="section level3 unnumbered">
<h3>Needed packages</h3>
<p>Let’s load all the packages needed for this chapter (this assumes you’ve already installed them). Read Section <a href="1-getting-started.html#packages">1.3</a> for information on how to install and load R packages.</p>
<div class="sourceCode" id="cb433"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb433-1"><a href="11-thinking-with-data.html#cb433-1" aria-hidden="true" tabindex="-1"></a><span class="fu">library</span>(tidyverse)</span>
<span id="cb433-2"><a href="11-thinking-with-data.html#cb433-2" aria-hidden="true" tabindex="-1"></a><span class="fu">library</span>(moderndive)</span>
<span id="cb433-3"><a href="11-thinking-with-data.html#cb433-3" aria-hidden="true" tabindex="-1"></a><span class="fu">library</span>(skimr)</span>
<span id="cb433-4"><a href="11-thinking-with-data.html#cb433-4" aria-hidden="true" tabindex="-1"></a><span class="fu">library</span>(fivethirtyeight)</span></code></pre></div>
</div>
</div>
<div id="seattle-house-prices" class="section level2" number="11.2">
<h2><span class="header-section-number">11.2</span> Case study: Seattle house prices</h2>
<p><a href="https://www.kaggle.com/">Kaggle.com</a> is a machine learning and predictive modeling competition website that hosts datasets uploaded by companies, governmental organizations, and other individuals. One of their datasets is the <a href="https://www.kaggle.com/harlfoxem/housesalesprediction">“House Sales in King County, USA”</a>. It consists of sale prices of homes sold between May 2014 and May 2015 in King County, Washington, USA, which includes the greater Seattle metropolitan area. This dataset is in the <code>house_prices</code> data frame included in the <code>moderndive</code> package.</p>
<p>The dataset consists of 21,613 houses and 21 variables describing these houses (for a full list and description of these variables, see the help file by running <code>?house_prices</code> in the console). In this case study, we’ll create a multiple regression model where:</p>
<ul>
<li>The outcome variable <span class="math inline">\(y\)</span> is the sale <code>price</code> of houses.</li>
<li>Two explanatory variables:
<ol style="list-style-type: decimal">
<li>A numerical explanatory variable <span class="math inline">\(x_1\)</span>: house size <code>sqft_living</code> as measured in square feet of living space. Note that 1 square foot is about 0.09 square meters.</li>
<li>A categorical explanatory variable <span class="math inline">\(x_2\)</span>: house <code>condition</code>, a categorical variable with five levels where <code>1</code> indicates “poor” and <code>5</code> indicates “excellent.”</li>
</ol></li>
</ul>
<div id="house-prices-EDA-I" class="section level3" number="11.2.1">
<h3><span class="header-section-number">11.2.1</span> Exploratory data analysis: Part I</h3>
<p>As we’ve said numerous times throughout this book, a crucial first step when presented with data is to perform an exploratory data analysis (EDA). Exploratory data analysis can give you a sense of your data, help identify issues with your data, bring to light any outliers, and help inform model construction.</p>
<p>Recall the three common steps in an exploratory data analysis we introduced in Subsection <a href="5-regression.html#model1EDA">5.1.1</a>:</p>
<ol style="list-style-type: decimal">
<li>Looking at the raw data values.</li>
<li>Computing summary statistics.</li>
<li>Creating data visualizations.</li>
</ol>
<p>First, let’s look at the raw data using <code>View()</code> to bring up RStudio’s spreadsheet viewer and the <code>glimpse()</code> function from the <code>dplyr</code> package:</p>
<div class="sourceCode" id="cb434"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb434-1"><a href="11-thinking-with-data.html#cb434-1" aria-hidden="true" tabindex="-1"></a><span class="fu">View</span>(house_prices)</span>
<span id="cb434-2"><a href="11-thinking-with-data.html#cb434-2" aria-hidden="true" tabindex="-1"></a><span class="fu">glimpse</span>(house_prices)</span></code></pre></div>
<pre><code>Rows: 21,613
Columns: 21
$ id <chr> "7129300520", "6414100192", "5631500400", "2487200875", …
$ date <date> 2014-10-13, 2014-12-09, 2015-02-25, 2014-12-09, 2015-02…
$ price <dbl> 221900, 538000, 180000, 604000, 510000, 1225000, 257500,…
$ bedrooms <int> 3, 3, 2, 4, 3, 4, 3, 3, 3, 3, 3, 2, 3, 3, 5, 4, 3, 4, 2,…
$ bathrooms <dbl> 1.00, 2.25, 1.00, 3.00, 2.00, 4.50, 2.25, 1.50, 1.00, 2.…
$ sqft_living <int> 1180, 2570, 770, 1960, 1680, 5420, 1715, 1060, 1780, 189…
$ sqft_lot <int> 5650, 7242, 10000, 5000, 8080, 101930, 6819, 9711, 7470,…
$ floors <dbl> 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 2.0, 1.0, 1…
$ waterfront <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, …
$ view <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0,…
$ condition <fct> 3, 3, 3, 5, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 3, 3, 3, 4, 4,…
$ grade <fct> 7, 7, 6, 7, 8, 11, 7, 7, 7, 7, 8, 7, 7, 7, 7, 9, 7, 7, 7…
$ sqft_above <int> 1180, 2170, 770, 1050, 1680, 3890, 1715, 1060, 1050, 189…
$ sqft_basement <int> 0, 400, 0, 910, 0, 1530, 0, 0, 730, 0, 1700, 300, 0, 0, …
$ yr_built <int> 1955, 1951, 1933, 1965, 1987, 2001, 1995, 1963, 1960, 20…
$ yr_renovated <int> 0, 1991, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ zipcode <fct> 98178, 98125, 98028, 98136, 98074, 98053, 98003, 98198, …
$ lat <dbl> 47.5, 47.7, 47.7, 47.5, 47.6, 47.7, 47.3, 47.4, 47.5, 47…
$ long <dbl> -122, -122, -122, -122, -122, -122, -122, -122, -122, -1…
$ sqft_living15 <int> 1340, 1690, 2720, 1360, 1800, 4760, 2238, 1650, 1780, 23…
$ sqft_lot15 <int> 5650, 7639, 8062, 5000, 7503, 101930, 6819, 9711, 8113, …</code></pre>
<p>Here are some questions you can ask yourself at this stage of an EDA: Which variables are numerical? Which are categorical? For the categorical variables, what are their levels? Besides the variables we’ll be using in our regression model, what other variables do you think would be useful to use in a model for house price?</p>
<p>Observe, for example, that while the <code>condition</code> variable has values <code>1</code> through <code>5</code>, these are saved in R as <code>fct</code> standing for “factors.” This is one of R’s ways of saving categorical variables. So you should think of these as the “labels” <code>1</code> through <code>5</code> and not the numerical values <code>1</code> through <code>5</code>.</p>
<p>Let’s now perform the second step in an EDA: computing summary statistics. Recall from Section <a href="3-wrangling.html#summarize">3.5</a> that <em>summary statistics</em> are single numerical values that summarize a large number of values. Examples of summary statistics include the mean, the median, the standard deviation, and various percentiles.</p>
<p>We could do this using the <code>summarize()</code> function in the <code>dplyr</code> package along with R’s built-in <em>summary functions</em>, like <code>mean()</code> and <code>median()</code>. However, recall in Section <a href="3-wrangling.html#mutate">3.7</a>, we saw the following code that computes a variety of summary statistics of the variable <code>gain</code>, which is the amount of time that a flight makes up mid-air:</p>
<div class="sourceCode" id="cb436"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb436-1"><a href="11-thinking-with-data.html#cb436-1" aria-hidden="true" tabindex="-1"></a>gain_summary <span class="ot"><-</span> flights <span class="sc">%>%</span> </span>
<span id="cb436-2"><a href="11-thinking-with-data.html#cb436-2" aria-hidden="true" tabindex="-1"></a> <span class="fu">summarize</span>(</span>
<span id="cb436-3"><a href="11-thinking-with-data.html#cb436-3" aria-hidden="true" tabindex="-1"></a> <span class="at">min =</span> <span class="fu">min</span>(gain, <span class="at">na.rm =</span> <span class="cn">TRUE</span>),</span>
<span id="cb436-4"><a href="11-thinking-with-data.html#cb436-4" aria-hidden="true" tabindex="-1"></a> <span class="at">q1 =</span> <span class="fu">quantile</span>(gain, <span class="fl">0.25</span>, <span class="at">na.rm =</span> <span class="cn">TRUE</span>),</span>
<span id="cb436-5"><a href="11-thinking-with-data.html#cb436-5" aria-hidden="true" tabindex="-1"></a> <span class="at">median =</span> <span class="fu">quantile</span>(gain, <span class="fl">0.5</span>, <span class="at">na.rm =</span> <span class="cn">TRUE</span>),</span>
<span id="cb436-6"><a href="11-thinking-with-data.html#cb436-6" aria-hidden="true" tabindex="-1"></a> <span class="at">q3 =</span> <span class="fu">quantile</span>(gain, <span class="fl">0.75</span>, <span class="at">na.rm =</span> <span class="cn">TRUE</span>),</span>
<span id="cb436-7"><a href="11-thinking-with-data.html#cb436-7" aria-hidden="true" tabindex="-1"></a> <span class="at">max =</span> <span class="fu">max</span>(gain, <span class="at">na.rm =</span> <span class="cn">TRUE</span>),</span>
<span id="cb436-8"><a href="11-thinking-with-data.html#cb436-8" aria-hidden="true" tabindex="-1"></a> <span class="at">mean =</span> <span class="fu">mean</span>(gain, <span class="at">na.rm =</span> <span class="cn">TRUE</span>),</span>
<span id="cb436-9"><a href="11-thinking-with-data.html#cb436-9" aria-hidden="true" tabindex="-1"></a> <span class="at">sd =</span> <span class="fu">sd</span>(gain, <span class="at">na.rm =</span> <span class="cn">TRUE</span>),</span>
<span id="cb436-10"><a href="11-thinking-with-data.html#cb436-10" aria-hidden="true" tabindex="-1"></a> <span class="at">missing =</span> <span class="fu">sum</span>(<span class="fu">is.na</span>(gain))</span>
<span id="cb436-11"><a href="11-thinking-with-data.html#cb436-11" aria-hidden="true" tabindex="-1"></a> )</span></code></pre></div>
<p>To repeat this for all three <code>price</code>, <code>sqft_living</code>, and <code>condition</code> variables would be tedious to code up. So instead, let’s use the convenient <code>skim()</code> function from the <code>skimr</code> package we first used in Subsection <a href="6-multiple-regression.html#model4EDA">6.1.1</a>, being sure to only <code>select()</code> the variables of interest for our model:</p>
<div class="sourceCode" id="cb437"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb437-1"><a href="11-thinking-with-data.html#cb437-1" aria-hidden="true" tabindex="-1"></a>house_prices <span class="sc">%>%</span> </span>
<span id="cb437-2"><a href="11-thinking-with-data.html#cb437-2" aria-hidden="true" tabindex="-1"></a> <span class="fu">select</span>(price, sqft_living, condition) <span class="sc">%>%</span> </span>
<span id="cb437-3"><a href="11-thinking-with-data.html#cb437-3" aria-hidden="true" tabindex="-1"></a> <span class="fu">skim</span>()</span></code></pre></div>
<pre><code>Skim summary statistics
n obs: 21613
n variables: 3
── Variable type:factor
variable missing complete n n_unique top_counts ordered
condition 0 21613 21613 5 3: 14031, 4: 5679, 5: 1701, 2: 172 FALSE
── Variable type:integer
variable missing complete n mean sd p0 p25 p50 p75 p100
sqft_living 0 21613 21613 2079.9 918.44 290 1427 1910 2550 13540
── Variable type:numeric
variable missing complete n mean sd p0 p25 p50 p75 p100
price 0 21613 21613 540088.14 367127.2 75000 321950 450000 645000 7700000</code></pre>
<p>Observe that the mean <code>price</code> of $540,088 is larger than the median of $450,000. This is because a small number of very expensive houses are inflating the average. In other words, there are “outlier” house prices in our dataset. (This fact will become even more apparent when we create our visualizations next.)</p>
<p>However, the median is not as sensitive to such outlier house prices. This is why news about the real estate market generally report median house prices and not mean/average house prices. We say here that the median is more <em>robust to outliers</em> than the mean. Similarly, while both the standard deviation and interquartile-range (IQR) are both measures of spread and variability, the IQR is more <em>robust to outliers</em>.</p>
<p>Let’s now perform the last of the three common steps in an exploratory data analysis: creating data visualizations. Let’s first create <em>univariate</em> visualizations. These are plots focusing on a single variable at a time. Since <code>price</code> and <code>sqft_living</code> are numerical variables, we can visualize their distributions using a <code>geom_histogram()</code> as seen in Section <a href="2-viz.html#histograms">2.6</a> on histograms. On the other hand, since <code>condition</code> is categorical, we can visualize its distribution using a <code>geom_bar()</code>. Recall from Section <a href="2-viz.html#geombar">2.8</a> on barplots that since <code>condition</code> is not “pre-counted,” we use a <code>geom_bar()</code> and not a <code>geom_col()</code>.</p>
<div class="sourceCode" id="cb439"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb439-1"><a href="11-thinking-with-data.html#cb439-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Histogram of house price:</span></span>
<span id="cb439-2"><a href="11-thinking-with-data.html#cb439-2" aria-hidden="true" tabindex="-1"></a><span class="fu">ggplot</span>(house_prices, <span class="fu">aes</span>(<span class="at">x =</span> price)) <span class="sc">+</span></span>
<span id="cb439-3"><a href="11-thinking-with-data.html#cb439-3" aria-hidden="true" tabindex="-1"></a> <span class="fu">geom_histogram</span>(<span class="at">color =</span> <span class="st">"white"</span>) <span class="sc">+</span></span>
<span id="cb439-4"><a href="11-thinking-with-data.html#cb439-4" aria-hidden="true" tabindex="-1"></a> <span class="fu">labs</span>(<span class="at">x =</span> <span class="st">"price (USD)"</span>, <span class="at">title =</span> <span class="st">"House price"</span>)</span>
<span id="cb439-5"><a href="11-thinking-with-data.html#cb439-5" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb439-6"><a href="11-thinking-with-data.html#cb439-6" aria-hidden="true" tabindex="-1"></a><span class="co"># Histogram of sqft_living:</span></span>
<span id="cb439-7"><a href="11-thinking-with-data.html#cb439-7" aria-hidden="true" tabindex="-1"></a><span class="fu">ggplot</span>(house_prices, <span class="fu">aes</span>(<span class="at">x =</span> sqft_living)) <span class="sc">+</span></span>
<span id="cb439-8"><a href="11-thinking-with-data.html#cb439-8" aria-hidden="true" tabindex="-1"></a> <span class="fu">geom_histogram</span>(<span class="at">color =</span> <span class="st">"white"</span>) <span class="sc">+</span></span>
<span id="cb439-9"><a href="11-thinking-with-data.html#cb439-9" aria-hidden="true" tabindex="-1"></a> <span class="fu">labs</span>(<span class="at">x =</span> <span class="st">"living space (square feet)"</span>, <span class="at">title =</span> <span class="st">"House size"</span>)</span>
<span id="cb439-10"><a href="11-thinking-with-data.html#cb439-10" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb439-11"><a href="11-thinking-with-data.html#cb439-11" aria-hidden="true" tabindex="-1"></a><span class="co"># Barplot of condition:</span></span>
<span id="cb439-12"><a href="11-thinking-with-data.html#cb439-12" aria-hidden="true" tabindex="-1"></a><span class="fu">ggplot</span>(house_prices, <span class="fu">aes</span>(<span class="at">x =</span> condition)) <span class="sc">+</span></span>
<span id="cb439-13"><a href="11-thinking-with-data.html#cb439-13" aria-hidden="true" tabindex="-1"></a> <span class="fu">geom_bar</span>() <span class="sc">+</span></span>
<span id="cb439-14"><a href="11-thinking-with-data.html#cb439-14" aria-hidden="true" tabindex="-1"></a> <span class="fu">labs</span>(<span class="at">x =</span> <span class="st">"condition"</span>, <span class="at">title =</span> <span class="st">"House condition"</span>)</span></code></pre></div>
<p>In Figure <a href="11-thinking-with-data.html#fig:house-prices-viz">11.3</a>, we display all three of these visualizations at once.</p>
<div class="figure" style="text-align: center"><span id="fig:house-prices-viz"></span>
<img src="ModernDive_files/figure-html/house-prices-viz-1.png" alt="Exploratory visualizations of Seattle house prices data." width="\textwidth" />
<p class="caption">
FIGURE 11.3: Exploratory visualizations of Seattle house prices data.
</p>
</div>
<p>First, observe in the bottom plot that most houses are of condition “3,” with a few more of conditions “4” and “5,” and almost none that are “1” or “2.”</p>
<p>Next, observe in the histogram for <code>price</code> in the top-left plot that a majority of houses are less than two million dollars. Observe also that the x-axis stretches out to 8 million dollars, even though there does not appear to be any houses close to that price. This is because there are a <em>very small number</em> of houses with prices closer to 8 million. These are the outlier house prices we mentioned earlier. We say that the variable <code>price</code> is <em>right-skewed</em> as exhibited by the long right tail.</p>
<p>Further, observe in the histogram of <code>sqft_living</code> in the middle plot as well that most houses appear to have less than 5000 square feet of living space. For comparison, a football field in the US is about 57,600 square feet, whereas a standard soccer/association football field is about 64,000 square feet. Observe also that this variable is also right-skewed, although not as drastically as the <code>price</code> variable.</p>
<p>For both the <code>price</code> and <code>sqft_living</code> variables, the right-skew makes distinguishing houses at the lower end of the x-axis hard. This is because the scale of the x-axis is compressed by the small number of quite expensive and immensely-sized houses.</p>
<p>So what can we do about this skew? Let’s apply a <em>log10 transformation</em> to these variables. If you are unfamiliar with such transformations, we highly recommend you read Appendix <a href="A-appendixA.html#appendix-log10-transformations">A.3</a> on logarithmic (log) transformations. In summary, log transformations allow us to alter the scale of a variable to focus on <em>multiplicative</em> changes instead of <em>additive</em> changes. In other words, they shift the view to be on <em>relative</em> changes instead of <em>absolute</em> changes. Such multiplicative/relative changes are also called changes in <em>orders of magnitude</em>.</p>
<p>Let’s create new log10 transformed versions of the right-skewed variable <code>price</code> and <code>sqft_living</code> using the <code>mutate()</code> function from Section <a href="3-wrangling.html#mutate">3.7</a>, but we’ll give the latter the name <code>log10_size</code>, which is shorter and easier to understand than the name <code>log10_sqft_living</code>.</p>
<div class="sourceCode" id="cb440"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb440-1"><a href="11-thinking-with-data.html#cb440-1" aria-hidden="true" tabindex="-1"></a>house_prices <span class="ot"><-</span> house_prices <span class="sc">%>%</span></span>
<span id="cb440-2"><a href="11-thinking-with-data.html#cb440-2" aria-hidden="true" tabindex="-1"></a> <span class="fu">mutate</span>(</span>
<span id="cb440-3"><a href="11-thinking-with-data.html#cb440-3" aria-hidden="true" tabindex="-1"></a> <span class="at">log10_price =</span> <span class="fu">log10</span>(price),</span>
<span id="cb440-4"><a href="11-thinking-with-data.html#cb440-4" aria-hidden="true" tabindex="-1"></a> <span class="at">log10_size =</span> <span class="fu">log10</span>(sqft_living)</span>
<span id="cb440-5"><a href="11-thinking-with-data.html#cb440-5" aria-hidden="true" tabindex="-1"></a> )</span></code></pre></div>
<p>Let’s display the before and after effects of this transformation on these variables for only the first 10 rows of <code>house_prices</code>:</p>
<div class="sourceCode" id="cb441"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb441-1"><a href="11-thinking-with-data.html#cb441-1" aria-hidden="true" tabindex="-1"></a>house_prices <span class="sc">%>%</span> </span>
<span id="cb441-2"><a href="11-thinking-with-data.html#cb441-2" aria-hidden="true" tabindex="-1"></a> <span class="fu">select</span>(price, log10_price, sqft_living, log10_size)</span></code></pre></div>
<pre><code># A tibble: 21,613 x 4
price log10_price sqft_living log10_size
<dbl> <dbl> <int> <dbl>
1 221900 5.34616 1180 3.07188
2 538000 5.73078 2570 3.40993
3 180000 5.25527 770 2.88649
4 604000 5.78104 1960 3.29226
5 510000 5.70757 1680 3.22531
6 1225000 6.08814 5420 3.73400
7 257500 5.41078 1715 3.23426
8 291850 5.46516 1060 3.02531
9 229500 5.36078 1780 3.25042
10 323000 5.50920 1890 3.27646
# … with 21,603 more rows</code></pre>
<p>Observe in particular the houses in the sixth and third rows. The house in the sixth row has <code>price</code> $1,225,000, which is just above one million dollars. Since <span class="math inline">\(10^6\)</span> is one million, its <code>log10_price</code> is around 6.09.</p>
<p>Contrast this with all other houses with <code>log10_price</code> less than six, since they all have <code>price</code> less than $1,000,000. The house in the third row is the only house with <code>sqft_living</code> less than 1000. Since <span class="math inline">\(1000 = 10^3\)</span>, it’s the lone house with <code>log10_size</code> less than 3.</p>
<p>Let’s now visualize the before and after effects of this transformation for <code>price</code> in Figure <a href="11-thinking-with-data.html#fig:log10-price-viz">11.4</a>.</p>
<div class="sourceCode" id="cb443"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb443-1"><a href="11-thinking-with-data.html#cb443-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Before log10 transformation:</span></span>
<span id="cb443-2"><a href="11-thinking-with-data.html#cb443-2" aria-hidden="true" tabindex="-1"></a><span class="fu">ggplot</span>(house_prices, <span class="fu">aes</span>(<span class="at">x =</span> price)) <span class="sc">+</span></span>
<span id="cb443-3"><a href="11-thinking-with-data.html#cb443-3" aria-hidden="true" tabindex="-1"></a> <span class="fu">geom_histogram</span>(<span class="at">color =</span> <span class="st">"white"</span>) <span class="sc">+</span></span>
<span id="cb443-4"><a href="11-thinking-with-data.html#cb443-4" aria-hidden="true" tabindex="-1"></a> <span class="fu">labs</span>(<span class="at">x =</span> <span class="st">"price (USD)"</span>, <span class="at">title =</span> <span class="st">"House price: Before"</span>)</span>
<span id="cb443-5"><a href="11-thinking-with-data.html#cb443-5" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb443-6"><a href="11-thinking-with-data.html#cb443-6" aria-hidden="true" tabindex="-1"></a><span class="co"># After log10 transformation:</span></span>
<span id="cb443-7"><a href="11-thinking-with-data.html#cb443-7" aria-hidden="true" tabindex="-1"></a><span class="fu">ggplot</span>(house_prices, <span class="fu">aes</span>(<span class="at">x =</span> log10_price)) <span class="sc">+</span></span>
<span id="cb443-8"><a href="11-thinking-with-data.html#cb443-8" aria-hidden="true" tabindex="-1"></a> <span class="fu">geom_histogram</span>(<span class="at">color =</span> <span class="st">"white"</span>) <span class="sc">+</span></span>
<span id="cb443-9"><a href="11-thinking-with-data.html#cb443-9" aria-hidden="true" tabindex="-1"></a> <span class="fu">labs</span>(<span class="at">x =</span> <span class="st">"log10 price (USD)"</span>, <span class="at">title =</span> <span class="st">"House price: After"</span>)</span></code></pre></div>
<div class="figure" style="text-align: center"><span id="fig:log10-price-viz"></span>
<img src="ModernDive_files/figure-html/log10-price-viz-1.png" alt="House price before and after log10 transformation." width="\textwidth" />
<p class="caption">
FIGURE 11.4: House price before and after log10 transformation.
</p>
</div>
<p>Observe that after the transformation, the distribution is much less skewed, and in this case, more symmetric and more bell-shaped. Now you can more easily distinguish the lower priced houses.</p>
<p>Let’s do the same for house size, where the variable <code>sqft_living</code> was log10 transformed to <code>log10_size</code>.</p>
<div class="sourceCode" id="cb444"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb444-1"><a href="11-thinking-with-data.html#cb444-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Before log10 transformation:</span></span>
<span id="cb444-2"><a href="11-thinking-with-data.html#cb444-2" aria-hidden="true" tabindex="-1"></a><span class="fu">ggplot</span>(house_prices, <span class="fu">aes</span>(<span class="at">x =</span> sqft_living)) <span class="sc">+</span></span>
<span id="cb444-3"><a href="11-thinking-with-data.html#cb444-3" aria-hidden="true" tabindex="-1"></a> <span class="fu">geom_histogram</span>(<span class="at">color =</span> <span class="st">"white"</span>) <span class="sc">+</span></span>
<span id="cb444-4"><a href="11-thinking-with-data.html#cb444-4" aria-hidden="true" tabindex="-1"></a> <span class="fu">labs</span>(<span class="at">x =</span> <span class="st">"living space (square feet)"</span>, <span class="at">title =</span> <span class="st">"House size: Before"</span>)</span>
<span id="cb444-5"><a href="11-thinking-with-data.html#cb444-5" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb444-6"><a href="11-thinking-with-data.html#cb444-6" aria-hidden="true" tabindex="-1"></a><span class="co"># After log10 transformation:</span></span>
<span id="cb444-7"><a href="11-thinking-with-data.html#cb444-7" aria-hidden="true" tabindex="-1"></a><span class="fu">ggplot</span>(house_prices, <span class="fu">aes</span>(<span class="at">x =</span> log10_size)) <span class="sc">+</span></span>
<span id="cb444-8"><a href="11-thinking-with-data.html#cb444-8" aria-hidden="true" tabindex="-1"></a> <span class="fu">geom_histogram</span>(<span class="at">color =</span> <span class="st">"white"</span>) <span class="sc">+</span></span>
<span id="cb444-9"><a href="11-thinking-with-data.html#cb444-9" aria-hidden="true" tabindex="-1"></a> <span class="fu">labs</span>(<span class="at">x =</span> <span class="st">"log10 living space (square feet)"</span>, <span class="at">title =</span> <span class="st">"House size: After"</span>)</span></code></pre></div>
<div class="figure" style="text-align: center"><span id="fig:log10-size-viz"></span>
<img src="ModernDive_files/figure-html/log10-size-viz-1.png" alt="House size before and after log10 transformation." width="\textwidth" />
<p class="caption">
FIGURE 11.5: House size before and after log10 transformation.
</p>
</div>
<p>Observe in Figure <a href="11-thinking-with-data.html#fig:log10-size-viz">11.5</a> that the log10 transformation has a similar effect of unskewing the variable. We emphasize that while in these two cases the resulting distributions are more symmetric and bell-shaped, this is not always necessarily the case.</p>
<p>Given the now symmetric nature of <code>log10_price</code> and <code>log10_size</code>, we are going to revise our multiple regression model to use our new variables:</p>
<ol style="list-style-type: decimal">
<li>The outcome variable <span class="math inline">\(y\)</span> is the sale <code>log10_price</code> of houses.</li>
<li>Two explanatory variables:
<ol style="list-style-type: decimal">
<li>A numerical explanatory variable <span class="math inline">\(x_1\)</span>: house size <code>log10_size</code> as measured in log base 10 square feet of living space.</li>
<li>A categorical explanatory variable <span class="math inline">\(x_2\)</span>: house <code>condition</code>, a categorical variable with five levels where <code>1</code> indicates “poor” and <code>5</code> indicates “excellent.”</li>
</ol></li>
</ol>
</div>
<div id="house-prices-EDA-II" class="section level3" number="11.2.2">
<h3><span class="header-section-number">11.2.2</span> Exploratory data analysis: Part II</h3>
<p>Let’s now continue our EDA by creating <em>multivariate</em> visualizations. Unlike the <em>univariate</em> histograms and barplot in the earlier Figures <a href="11-thinking-with-data.html#fig:house-prices-viz">11.3</a>, <a href="11-thinking-with-data.html#fig:log10-price-viz">11.4</a>, and <a href="11-thinking-with-data.html#fig:log10-size-viz">11.5</a>, <em>multivariate</em> visualizations show relationships between more than one variable. This is an important step of an EDA to perform since the goal of modeling is to explore relationships between variables.</p>
<p>Since our model involves a numerical outcome variable, a numerical explanatory variable, and a categorical explanatory variable, we are in a similar regression modeling situation as in Section <a href="6-multiple-regression.html#model4">6.1</a> where we studied the UT Austin teaching scores dataset. Recall in that case the numerical outcome variable was teaching <code>score</code>, the numerical explanatory variable was instructor <code>age</code>, and the categorical explanatory variable was (binary) <code>gender</code>.</p>
<p>We thus have two choices of models we can fit: either (1) an <em>interaction model</em> where the regression line for each <code>condition</code> level will have both a different slope and a different intercept or (2) a <em>parallel slopes model</em> where the regression line for each <code>condition</code> level will have the same slope but different intercepts.</p>
<p>Recall from Subsection <a href="6-multiple-regression.html#model4table">6.1.3</a> that the <code>geom_parallel_slopes()</code> function is a special purpose function that Evgeni Chasnovski created and included in the <code>moderndive</code> package, since the <code>geom_smooth()</code> method in the <code>ggplot2</code> package does not have a convenient way to plot parallel slopes models. We plot both resulting models in Figure <a href="11-thinking-with-data.html#fig:house-price-parallel-slopes">11.6</a>, with the interaction model on the left.</p>
<div class="sourceCode" id="cb445"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb445-1"><a href="11-thinking-with-data.html#cb445-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Plot interaction model</span></span>
<span id="cb445-2"><a href="11-thinking-with-data.html#cb445-2" aria-hidden="true" tabindex="-1"></a><span class="fu">ggplot</span>(house_prices, </span>
<span id="cb445-3"><a href="11-thinking-with-data.html#cb445-3" aria-hidden="true" tabindex="-1"></a> <span class="fu">aes</span>(<span class="at">x =</span> log10_size, <span class="at">y =</span> log10_price, <span class="at">col =</span> condition)) <span class="sc">+</span></span>
<span id="cb445-4"><a href="11-thinking-with-data.html#cb445-4" aria-hidden="true" tabindex="-1"></a> <span class="fu">geom_point</span>(<span class="at">alpha =</span> <span class="fl">0.05</span>) <span class="sc">+</span></span>
<span id="cb445-5"><a href="11-thinking-with-data.html#cb445-5" aria-hidden="true" tabindex="-1"></a> <span class="fu">geom_smooth</span>(<span class="at">method =</span> <span class="st">"lm"</span>, <span class="at">se =</span> <span class="cn">FALSE</span>) <span class="sc">+</span></span>
<span id="cb445-6"><a href="11-thinking-with-data.html#cb445-6" aria-hidden="true" tabindex="-1"></a> <span class="fu">labs</span>(<span class="at">y =</span> <span class="st">"log10 price"</span>, </span>
<span id="cb445-7"><a href="11-thinking-with-data.html#cb445-7" aria-hidden="true" tabindex="-1"></a> <span class="at">x =</span> <span class="st">"log10 size"</span>, </span>
<span id="cb445-8"><a href="11-thinking-with-data.html#cb445-8" aria-hidden="true" tabindex="-1"></a> <span class="at">title =</span> <span class="st">"House prices in Seattle"</span>)</span>
<span id="cb445-9"><a href="11-thinking-with-data.html#cb445-9" aria-hidden="true" tabindex="-1"></a><span class="co"># Plot parallel slopes model</span></span>
<span id="cb445-10"><a href="11-thinking-with-data.html#cb445-10" aria-hidden="true" tabindex="-1"></a><span class="fu">ggplot</span>(house_prices, </span>
<span id="cb445-11"><a href="11-thinking-with-data.html#cb445-11" aria-hidden="true" tabindex="-1"></a> <span class="fu">aes</span>(<span class="at">x =</span> log10_size, <span class="at">y =</span> log10_price, <span class="at">col =</span> condition)) <span class="sc">+</span></span>
<span id="cb445-12"><a href="11-thinking-with-data.html#cb445-12" aria-hidden="true" tabindex="-1"></a> <span class="fu">geom_point</span>(<span class="at">alpha =</span> <span class="fl">0.05</span>) <span class="sc">+</span></span>
<span id="cb445-13"><a href="11-thinking-with-data.html#cb445-13" aria-hidden="true" tabindex="-1"></a> <span class="fu">geom_parallel_slopes</span>(<span class="at">se =</span> <span class="cn">FALSE</span>) <span class="sc">+</span></span>
<span id="cb445-14"><a href="11-thinking-with-data.html#cb445-14" aria-hidden="true" tabindex="-1"></a> <span class="fu">labs</span>(<span class="at">y =</span> <span class="st">"log10 price"</span>, </span>
<span id="cb445-15"><a href="11-thinking-with-data.html#cb445-15" aria-hidden="true" tabindex="-1"></a> <span class="at">x =</span> <span class="st">"log10 size"</span>, </span>
<span id="cb445-16"><a href="11-thinking-with-data.html#cb445-16" aria-hidden="true" tabindex="-1"></a> <span class="at">title =</span> <span class="st">"House prices in Seattle"</span>)</span></code></pre></div>
<div class="figure" style="text-align: center"><span id="fig:house-price-parallel-slopes"></span>
<img src="ModernDive_files/figure-html/house-price-parallel-slopes-1.png" alt="Interaction and parallel slopes models." width="\textwidth" />
<p class="caption">
FIGURE 11.6: Interaction and parallel slopes models.
</p>
</div>
<p>In both cases, we see there is a positive relationship between house price and size, meaning as houses are larger, they tend to be more expensive. Furthermore, in both plots it seems that houses of condition 5 tend to be the most expensive for most house sizes as evidenced by the fact that the line for condition 5 is highest, followed by conditions 4 and 3. As for conditions 1 and 2, this pattern isn’t as clear. Recall from the univariate barplot of <code>condition</code> in Figure <a href="11-thinking-with-data.html#fig:house-prices-viz">11.3</a>, there are only a few houses of condition 1 or 2.</p>
<p>Let’s also show a faceted version of just the interaction model in Figure <a href="11-thinking-with-data.html#fig:house-price-interaction-2">11.7</a>. It is now much more apparent just how few houses are of condition 1 or 2.</p>
<div class="sourceCode" id="cb446"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb446-1"><a href="11-thinking-with-data.html#cb446-1" aria-hidden="true" tabindex="-1"></a><span class="fu">ggplot</span>(house_prices, </span>
<span id="cb446-2"><a href="11-thinking-with-data.html#cb446-2" aria-hidden="true" tabindex="-1"></a> <span class="fu">aes</span>(<span class="at">x =</span> log10_size, <span class="at">y =</span> log10_price, <span class="at">col =</span> condition)) <span class="sc">+</span></span>
<span id="cb446-3"><a href="11-thinking-with-data.html#cb446-3" aria-hidden="true" tabindex="-1"></a> <span class="fu">geom_point</span>(<span class="at">alpha =</span> <span class="fl">0.4</span>) <span class="sc">+</span></span>
<span id="cb446-4"><a href="11-thinking-with-data.html#cb446-4" aria-hidden="true" tabindex="-1"></a> <span class="fu">geom_smooth</span>(<span class="at">method =</span> <span class="st">"lm"</span>, <span class="at">se =</span> <span class="cn">FALSE</span>) <span class="sc">+</span></span>
<span id="cb446-5"><a href="11-thinking-with-data.html#cb446-5" aria-hidden="true" tabindex="-1"></a> <span class="fu">labs</span>(<span class="at">y =</span> <span class="st">"log10 price"</span>, </span>
<span id="cb446-6"><a href="11-thinking-with-data.html#cb446-6" aria-hidden="true" tabindex="-1"></a> <span class="at">x =</span> <span class="st">"log10 size"</span>, </span>
<span id="cb446-7"><a href="11-thinking-with-data.html#cb446-7" aria-hidden="true" tabindex="-1"></a> <span class="at">title =</span> <span class="st">"House prices in Seattle"</span>) <span class="sc">+</span></span>
<span id="cb446-8"><a href="11-thinking-with-data.html#cb446-8" aria-hidden="true" tabindex="-1"></a> <span class="fu">facet_wrap</span>(<span class="sc">~</span> condition)</span></code></pre></div>
<div class="figure" style="text-align: center"><span id="fig:house-price-interaction-2"></span>
<img src="ModernDive_files/figure-html/house-price-interaction-2-1.png" alt="Faceted plot of interaction model." width="\textwidth" />
<p class="caption">
FIGURE 11.7: Faceted plot of interaction model.
</p>
</div>
<p>Which exploratory visualization of the interaction model is better, the one in the left-hand plot of Figure <a href="11-thinking-with-data.html#fig:house-price-parallel-slopes">11.6</a> or the faceted version in Figure <a href="11-thinking-with-data.html#fig:house-price-interaction-2">11.7</a>? There is no universal right answer. You need to make a choice depending on what you want to convey, and own that choice, with including and discussing both also as an option as needed.</p>
</div>
<div id="house-prices-regression" class="section level3" number="11.2.3">
<h3><span class="header-section-number">11.2.3</span> Regression modeling</h3>
<p>Which of the two models in Figure <a href="11-thinking-with-data.html#fig:house-price-parallel-slopes">11.6</a> is “better?” The interaction model in the left-hand plot or the parallel slopes model in the right-hand plot?</p>
<p>We had a similar discussion in Subsection <a href="6-multiple-regression.html#model-selection">6.3.1</a> on <em>model selection</em>. Recall that we stated that we should only favor more complex models if the additional complexity is <em>warranted</em>. In this case, the more complex model is the interaction model since it considers five intercepts and five slopes total. This is in contrast to the parallel slopes model which considers five intercepts but only one common slope.</p>
<p>Is the additional complexity of the interaction model warranted? Looking at the left-hand plot in Figure <a href="11-thinking-with-data.html#fig:house-price-parallel-slopes">11.6</a>, we’re of the opinion that it is, as evidenced by the slight x-like pattern to some of the lines. Therefore, we’ll focus the rest of this analysis only on the interaction model. This visual approach is somewhat subjective, however, so feel free to disagree! What are the five different slopes and five different intercepts for the interaction model? We can obtain these values from the regression table. Recall our two-step process for getting the regression table:</p>
<div class="sourceCode" id="cb447"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb447-1"><a href="11-thinking-with-data.html#cb447-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Fit regression model:</span></span>
<span id="cb447-2"><a href="11-thinking-with-data.html#cb447-2" aria-hidden="true" tabindex="-1"></a>price_interaction <span class="ot"><-</span> <span class="fu">lm</span>(log10_price <span class="sc">~</span> log10_size <span class="sc">*</span> condition, </span>
<span id="cb447-3"><a href="11-thinking-with-data.html#cb447-3" aria-hidden="true" tabindex="-1"></a> <span class="at">data =</span> house_prices)</span>
<span id="cb447-4"><a href="11-thinking-with-data.html#cb447-4" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb447-5"><a href="11-thinking-with-data.html#cb447-5" aria-hidden="true" tabindex="-1"></a><span class="co"># Get regression table:</span></span>
<span id="cb447-6"><a href="11-thinking-with-data.html#cb447-6" aria-hidden="true" tabindex="-1"></a><span class="fu">get_regression_table</span>(price_interaction)</span></code></pre></div>
<table class="table" style="font-size: 16px; margin-left: auto; margin-right: auto;">
<caption style="font-size: initial !important;">
<span id="tab:seattle-interaction">TABLE 11.1: </span>Regression table for interaction model
</caption>
<thead>
<tr>
<th style="text-align:left;">
term
</th>
<th style="text-align:right;">
estimate
</th>
<th style="text-align:right;">
std_error
</th>
<th style="text-align:right;">
statistic
</th>
<th style="text-align:right;">
p_value
</th>
<th style="text-align:right;">
lower_ci
</th>
<th style="text-align:right;">
upper_ci
</th>
</tr>
</thead>
<tbody>
<tr>