-
Notifications
You must be signed in to change notification settings - Fork 4
/
Copy pathR.learning.Rnw
180 lines (114 loc) · 23 KB
/
R.learning.Rnw
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
\chapter{Using the Book to Learn \Rlang}
\begin{VF}
The important part of becoming a programmer is learning to think like a programmer. You don't need to know the details of a programming language by heart, you can just look that stuff up.\vspace{1ex}
\noindent The treasure is in the structure, not the nails.
\VA{P. Burns}{\emph{Tao Te Programming}, 2012}\nocite{Burns2012}
\end{VF}
\section{Aims of This Chapter}
In this chapter, I describe how I imagine the book can be used most effectively to learn the \Rlang language. Learning \Rlang and remembering what one has previously learnt and forgotten makes it also necessary to use this book and other sources as references. Learning to use \Rlang effectively also involves learning how to search for information and how to ask questions from other users, for example through online forums. Thus, I also give advice on how to find answers to \Rlang-related questions and how to use the available documentation.
\section{Approach and Structure}
Depending on previous experience, reading \emph{Learn R: As a Language} will be about exploring a new world or revisiting a familiar one. In both cases this book aims to be a travel guide, neither a traveller's account nor a cookbook of \Rlang recipes. It can be used as a course book, supplementary reading or for self-instruction, and also as a reference. My hope is that as a guide to the use of \Rlang, this book will remain useful to readers as they gain experience and develop skills.\vspace{1ex}
\noindent
\emph{I encourage readers to approach \Rlang like a child approaches his or her mother tongue when learning to speak: do not struggle, just play, and fool around with \Rlang! If the going gets difficult and frustrating, take a break! If you get a new insight, take a break to enjoy the victory!\vspace{1ex}
}
In \Rlang, like in most ``rich'' languages, there are multiple ways of coding the same operations. I have included code examples that aim to strike a balance between execution speed and readability. One could write equivalent \Rlang books using substantially different code examples. Keep this in mind when reading the book and using \Rlang. Keep also in mind that it is impossible to remember everything about \Rlang, and as a user you will frequently need to consult the documentation, even while doing the exercises in this book. The \Rlang language, in a broad sense, is vast because it can be expanded with independently developed packages. Learning to use \Rlang mainly consists of learning the basics plus developing the skill of finding your way in \Rlang, its documentation and on-line question-and-answer forums.
Readers should not aim to remember all the details presented in the book. This is impossible for most of us. Later use of this and other books, and documentation effectively as references, depends on a good grasp of a broad picture of how \Rlang works and on learning how to navigate the documentation; i.e., it is more important to remember abstractions and in what situations they are used, and function names, than the details of how to use them. Developing a sense of when one needs to be careful not to fall into a ``language trap'' is also important.
The book can be used both as a textbook for learning \Rlang and as a reference. It starts with simple concepts and language elements progressing towards more complex language structures and uses. Along the way readers will find, in each chapter, descriptions and examples of the common (usual) cases and the exceptions. Some books hide the exceptions and counterintuitive features from learners to make the learning easier; I instead have included these but marked them using icons and marginal bars. There are two reasons for choosing this approach. First, the boundary between boringly easy and frustratingly challenging is different for each of us, and varies depending on the subject dealt with. So, I hope the marks will help readers predict what to expect, how much effort to put into each section, and even what to read and what to skip. Second, if I had hidden the tricky bits of the \Rlang language, I would have made later use of \Rlang by the readers more difficult. It would have also made the book less useful as a reference.
The book contains many code examples as well as exercises. I expect readers will run code examples and try as many variations of them as needed to develop an understanding of the ``rules'' of the \Rpgrm language, e.g., how the function or feature exemplified works. This is what long-time users of \Rlang do when facing an unfamiliar feature or a gap in their understanding.
Readers who are new to \Rlang should read at least chapters \ref{chap:R:introduction} to \ref{chap:R:functions} sequentially. Possibly, skipping parts of the text and exercises marked as advanced. However, I expect to be most useful to these readers not to completely skip the description of unusual features and special cases but rather to skim enough from them so as to get an idea of what special situations they may face as \Rlang users. Exercises should not be skipped, as they are a key component of the didactic approach used.
Readers already familiar with \Rlang will be able to read the chapters in the book in any order, as the need arises. Marginal bars and icons, and the back and forward cross-references among sections, make possible for readers to \emph{find a good path} within the book both when learning \Rlang and when using the book as a reference.
I expect \emph{Learn R: As a Language} to remain useful as a reference to those readers who use it to learn \Rlang. It will also be useful as a reference to readers already familiar with \Rlang. To support the use of the book as a reference, I have been thorough with indexing, including many carefully chosen terms, their synonyms, and the names of all \Rlang objects and constructs discussed, collecting them in three alphabetical indexes: \emph{General index}, \emph{Index of R names by category}, and \emph{Alphabetic index of R names} starting at pages \pageref{idx:general}, \pageref{idx:rcats} and \pageref{idx:rindex}, respectively. I have also included back and forward cross-references linking related sections throughout the whole book.
\section{Typographic and Naming Conventions}
\subsection{Call-outs}
Marginal bars and icons are used in the book to inform about what content is advanced or included with a specific aim. The following icons and colours are used.
%\begin{infobox}
%Signals ancillary information, in most cases unrelated to \Rlang as a language.
%\end{infobox}
\begin{explainbox}
Signals in-depth explanations of specific \Rlang features or general programming concepts. Several of these explanations make reference to programming concepts or features of the \Rlang language that are explained later in the book. Readers new to \Rlang and computer programming can safely skip these call-outs on the first reading of the book. To become proficient in the use of \Rlang these readers are expected to return at a later time without hurry, preferably with a cup of coffee or tea to these call-outs. Readers with more experience, like those possibly reading individual chapters or using the book as a reference, will find these in-depth explanations useful.
\end{explainbox}
\begin{warningbox}
Signals important bits of information that must be remembered when using \Rlang---i.e., explanations of some unusual, but important, feature of the language or concepts that in my experience are easily missed by those new to \Rlang.
\end{warningbox}
\begin{faqboxNI}{Frequently asked question}
Signals my answer to a question that I expect to be useful to readers based on the popularity of similar or related questions posted in online forums. When reading through the book, they highlight things that are worth special attention. When using the book as a reference, they help find solutions to frequently encountered difficulties. Index on page \pageref{index:faq}.
\end{faqboxNI}
\begin{playground}
Signals a \emph{playground} containing open-ended exercises---ideas and pieces of \Rlang code to play with at the \Rlang console. I expect readers to run these examples both as is and after creating variations by editing the code, studying the output, or diagnosis messages, returned by \Rlang in each case. Numbered by chapter for easy reference.
\end{playground}
\begin{advplayground}
Signals an \emph{advanced playground} that requires more time to play with before grasping concepts than regular \emph{playgrounds}. Numbered by chapter together with other playgrounds.
\end{advplayground}
\subsection{Code conventions and syntax highlighting}
Small sections of program code interspersed within the main text, receive the name of \emph{code chunks}. In this book \Rlang code chunks are typeset in a typewriter font, using colour to highlight the different elements of the syntax, such as variables, functions, constant values, etc. The command line prompts (\verb|>| and \verb|+|) are not displayed in the chunks. \Rlang code elements embedded in the text are similarly typeset but always black. For example, in the code chunk below, \code{mean()} and \code{print()} are functions; 1, 5, and 3 are constant numeric values, and \code{z} is the name of a variable where the result of the computation done in the first line of code is stored. The line starting with \code{\#\#} shows what is printed or shown when executing the second statement: \code{[1] 1}. In the book, \code{\#\#} is used as a marker to signal output from \Rlang, it is not part of the output. As \code{\#} is the marker for comments in the \Rlang language, prepending \code{\#} to the output makes it possible to copy and paste into the \Rlang console the whole contents of the code chunks as they appear in the book.
<<syntax-highlight-1>>=
z <- mean(1, 5, 3)
print(z)
@
When explaining general concepts I use short abstract names, while for real-life examples I use descriptive names. Although not required, for clarity, I use abstract names that hint at the structure of objects stored, such as \code{mat1} for a matrix, \code{vct4} for a vector and \code{df3} for a data frame. This convention resembles that followed by the base \Rlang documentation.
Code in playgrounds either works in isolation, or when it depends on objects created in the examples in the main text, this is mentioned within the playground. In playgrounds I use names in capital letters so that they are distinct. The code outside playgrounds does reuse objects created earlier in the same section, and occasionally in earlier sections of the same chapter.
\subsection{Diagrams}
To describe data objects, I use diagrams similar to Joseph N. Hall's PEGS (Perl Graphical Structures) \autocite{Hall1997}. I use colour fill to highlight the type of the stored objects. I use the ``signal'' sign for the names of whole objects and of their component members, the former with a thicker border. Below is an example from chapter \ref{chap:R:as:calc}.
\begin{center}
\begin{footnotesize}
\begin{tikzpicture}[font=\sffamily,
array/.style={matrix of nodes,nodes={draw, minimum size=7mm, fill=codeshadecolor},column sep=-\pgflinewidth, row sep=0.5mm, nodes in empty cells,
row 1/.style={nodes={draw=none, fill=none, minimum size=5mm}},
row 1 column 1/.style={nodes={draw}}}]
\matrix[array] (array) {
1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 & 10\\
& & & & & & & & & \\};
\node[draw, fill=gray, minimum size=4mm] at (array-2-9) (box) {};
\begin{scope}[on background layer]
\fill[blue!10] (array-1-1.north west) rectangle (array-1-10.south east);
\end{scope}
\draw (array-2-1.west) node [signal, draw, fill=codeshadecolor, minimum size=6mm, line width=1.5pt, left] (first) {\textcolor{blue}{\ \code{<name>}\strut}};
\draw (array-1-1.north)--++(90:3mm) node [above] (first) {First index};
\draw (array-1-10.east)--++(0:3mm) node [right]{positional indices};
\draw (array-2-10.east)--++(0:3mm) node [right]{Members or \textcolor{blue}{\code{<values>}}};
\node [align=center, anchor=south] at (array-2-9.north west|-first.south) (8) {element at index 9};
\draw (8)--(box);
%
\end{tikzpicture}
\end{footnotesize}
\end{center}
To describe code structure, I use diagrams based on boxes and arrows, while to describe the flow of code execution I use traditional flow charts.
In the different diagrams, I use the notation \textcolor{blue}{\code{<value>}}, \textcolor{blue}{\code{<statement>}}, \textcolor{blue}{\code{<name>}}, etc., as generic placeholders indicating \emph{any valid value}, \emph{any valid \Rlang statement}, \emph{any valid \Rlang name}, etc.
\section{Findings Answers to Problems}
\subsection{What are the options?}
First of all, do not panic! Every programmer, even those with decades of experience, gets stuck with problems from time to time and can run out of ideas for a while. This is normal and happens to all of us.
It is important to learn how to find answers as part of the routine of using \Rlang. We should start by reading the documentation of the function or object that we are trying to use, which in many cases also includes examples. \Rlang's help pages tell how to use individual functions or objects. In contrast, \Rlang's manual \emph{An Introduction to R}, and other books describe what functions or overall approaches to use for different tasks.
Reading the documentation and books not always helps. Sometimes one can become blind to the obvious, by being too familiar with a piece of code, as it also happens when writing in a natural language like English. A second useful step is, thus, looking at the code with ``different eyes'', those of a friend or workmate, or your own eyes a day or a week later.
One can also seek help in specialised online forums or from peers or ``local experts''. If searching in forums for existing questions and answers fails to yield a useful answer, one can write a new question in a forum.
When searching for answers, asking for an advice, or reading books, one can be confronted with different ways of approaching the same tasks. Do not allow this to overwhelm you; in most cases, it will not matter which approach you use as many computations can be done in \Rpgrm, as in any computer language, in several different ways, still obtaining the same result. Use the alternative that you find easier to understand.
\subsection{\Rlang's built-in help}
Every object available in base \Rlang or exported by an \Rlang extension package (functions, methods, classes, and data) is documented in \Rlang's help system. Sometimes a single help page documents several \Rlang objects. Not only help pages are always available, but they are structured consistently with a title, short description, and frequently also a detailed description. In the case of functions, parameter names, their purpose, and expected arguments are always described, as well as the returned value. Usually at the bottom of help pages, several examples of the use of the objects or functions are given. How to access \Rpgrm help is described in section \ref{sec:intro:using:R} on page \pageref{sec:intro:using:R}.
In addition to help pages, \Rpgrm's distribution includes useful manuals as PDF or HTML files. These manuals are also available at \url{https://rstudio.github.io/r-manuals/} restyled for easier reading in web browsers. In addition to help pages, many packages, contain \emph{vignettes} such as user guides or articles describing the algorithms used and/or containing use case examples. In the case of some packages, a web site with documentation in HTML format is also available. Package documentation can be also found in repositories like the \emph{Comprehensive R Archive Network}, better known as \CRAN. From \CRAN it is possible to download \Rpgrm and many extensions to it. The DESCRIPTION file of each \Rlang package provides contact information for the maintainer, links to web sites, and instructions on how to report bugs. Similar information plus a short description are frequently also available in a README file.
Error messages tend to be terse in \Rpgrm, and may require some lateral thinking and/or ``experimentation'' to understand the real cause behind problems. Learning to interpret error messages is necessary to become a proficient user of \Rpgrm, so forcing errors and warnings with purposely written ``bad'' code is a useful exercise.
\subsection{Online forums}\label{sec:intro:net:help}
\subsubsection*{Netiquette}
When posting requests for help, one needs to abide by what is usually described as ``netiquette'', which in many respects also applies to asking in person or by e-mail for help from a peer or local expert. Preference among sources of information depends on what one finds easier to use. Consideration towards others' time is necessary but has to be balanced against wasting too much of one's own time.
In\index{netiquette}\index{network etiquette} most internet forums, a certain behaviour is expected from those asking and answering questions. Some types of misbehaviour, such as the use of offensive or inappropriate language, will usually result in the user losing writing rights in a forum. Occasional minor misbehaviour usually results in the original question not being answered and, instead, the problem highlighted in a comment. In general, following the steps listed below will greatly increase your chances of getting a detailed and useful answer.
\begin{itemize}
\item Do your homework: first search for existing answers to your question, both online and in the documentation. (Do mention that you attempted this without success when you post your question.)
\item Provide a clear explanation of the problem, and all the relevant information. The version of \Rpgrm, operating system, and any packages loaded and their versions can be important.
\item If at all possible, provide a simplified and short, but self-contained, code example that reproduces the problem (sometimes called a \emph{reprex}).
\item Be polite.
\item Contribute to the forum by answering other users' questions when you know the answer.
\end{itemize}
\begin{explainbox}
Carefully preparing a reproducible example\index{reproducible example} (``reprex'') is crucial. A \emph{reprex} is a self-contained and as simple as possible piece of computer code that triggers (and so demonstrates) a problem. If possible, when data are needed, a data set included in base \Rpgrm or artificial data generated within the reprex code should be used. If the problem can only be reproduced with one's own data, then one needs to provide a minimal subset of it that still triggers the problem.
While preparing a \emph{reprex} one has to simplify the code, and sometimes this step makes clear the nature of the problem. Always, before posting a reprex online, check it with the latest versions of \Rpgrm and any package being used. If sharing data, be careful about confidential information and either remove or mangle it.
I must say that about two out of three times I prepare a \emph{reprex}, it allows me to find the root of the problem and a solution or a work-around on my own. Preparing a \emph{reprex} takes some effort but it is worthwhile even if it ends up not being posted online.
\Rlang package \pkgname{reprex} and its \RStudio add-in simplify the creation of reproducible code examples, by creating and copying to the clipboard a reprex encoded in \Markdown and ready to be pasted into a question at \stackoverflow or an issue at \GitHub. See \url{https://reprex.tidyverse.org/} for details.
\end{explainbox}
\subsubsection*{StackOverflow}
Nowadays, \stackoverflow (\url{http://stackoverflow.com/}) is the best question-and-answer (Q\,\&\,A) support site for \Rpgrm. Within the \stackoverflow site there is an \Rlang collective. In most cases, searching for existing questions and their answers will be all that you need to do. If asking a question, make sure that it is really a new question. If there is some question that looks similar, make clear how your question is different.
\stackoverflow has a user-rights system based on reputation, and questions and answers can be up- and down-voted. Questions with the most up-votes are listed at the top of searches, and the most-voted answers to each question are also displayed first. Those who ask a question are expected to accept correct answers to help future readers. If the questions or answers one writes are up-voted or if answers are accepted one gains reputation (expressed as a number). As one accumulates reputation, one gets badges and additional rights, such as editing other users' questions and answers or later on, even deleting wrong answers or off-topic questions from the system. This sounds complicated, but works extremely well at ensuring that the base of questions and answers is relevant and correct, without relying heavily on nominated \emph{moderators}. When using \stackoverflow, do contribute by accepting correct answers, up-voting questions and answers that you find useful, down-voting those you consider poor, and flagging or correcting errors you may discover.
Being careful in the preparation of a reproducible example\index{reproducible example}\index{reprex|see{reproducible example}} is important in two situations: 1) when asking a question at \stackoverflow or other online forums and 2) when reporting a bug to the maintainer of any piece of software. For the question to be reliably answered or the problem to be fixed, the person answering a question, needs to be able to reproduce the problem, and after modifying the code, needs to be able to test if the problem has been solved or not. However, even if you are facing a problem caused by your misunderstanding of how \Rlang works, the simpler the example, the more likely that someone will quickly realise what your intention was when writing the code that produces a result different from what you expected. Even when it is not possible to create a reprex, one needs to ask clearly only one thing per question.
The code of conduct (\url{https://stackoverflow.com/conduct}) and help that explains expected behaviour (\url{https://stackoverflow.com/help}) are available at the site and worthwhile reading before using the site actively for the first time.
\subsubsection*{Contacting the author}
The best way to get in contact with me about this book is by raising an issue at \url{https://github.com/aphalo/learnr-book-crc/issues}. Issues can be used both to ask for support questions related to the book, report mistakes and suggest changes to the text, diagrams and/or example code. Edits to the manuscript of this book can be submitted as a pull request.
Issues are raised by filling-in an online form, on a web page that also contains brief instructions. Git issues are a very efficient way of keeping track of corrections that need to be done. As support questions usually reveal unclear explanations or other problems, raising issues to ask them facilitates the tasks of improving and keeping the book up-to-date.
\section{Further Reading}
To understand what programming as an activity is, you can read \citetitle{Burns2012} \autocite{Burns2012}. It will make easier the learning of programming in \Rlang, both practically and emotionally. In \citeauthor{Burns2012}'s words ``This is a book about what goes on in the minds of programmers''.