Typesetting adjustments 2

A few edits to text, including correction of errors.
aphalo · Jan 6, 2024 · 6eca12d · 6eca12d
1 parent 1e3a20d
commit 6eca12d
Show file tree

Hide file tree

Showing 30 changed files with 290,379 additions and 290,123 deletions.
diff --git a/R.data.Rnw b/R.data.Rnw
@@ -553,9 +553,8 @@ long_iris.tb |>
 long_iris.tb
 @
 
-In the next few chunks, we print the returned values rather than saving them in variables. In normal use, one would combine these functions into a pipe using operator \Roperator{\textbar >}.
-
-Function \Rfunction{arrange()} is used for sorting the rows---makes sorting a data frame or tibble simpler than by using \Rfunction{sort()} and \Rfunction{order()}. Here, we sort the tibble \code{long\_iris.tb} based on the values in three of its columns.
+In the next few chunks, returned values are displayed, while in normal use they would assigned to variables or passed to the next function in a pipe using \Roperator{\textbar >}.
+Function \Rfunction{arrange()} is used to sort rows---it makes sorting a data frame or tibble simpler than when using \Rfunction{sort()} or \Rfunction{order()}. Below, \code{long\_iris.tb} rows are sorted based on the values in three of its columns.
 
 <<tidy-tibble-03>>=
 arrange(long_iris.tb, Species, plant_part, part_dimension)
@@ -620,7 +619,8 @@ tibble(numbers = 1:9, Letters = rep(letters[1:3], 3)) |>
             median_num = median(numbers),
             n = n()) |>
   ungroup() # not always needed but safer
-@
+@%
+\pagebreak
 
 In the non-persistent grouping approach, we specify the grouping in the call to \Rfunction{summarise()} (this new feature is labelled as experimental in \pkgname{dplyr} version 1.1.3, and may change in future versions).
 
@@ -659,7 +659,7 @@ names(attributes(my_gr.tb))
 setdiff(attributes(my_gr.tb), attributes(my.tb))
 @
 
-A call to \Rfunction{ungroup()} removes the grouping, restoring the original tibble..
+A call to \Rfunction{ungroup()} removes the grouping, restoring the original tibble.
 
 <<tibble-grouped-box-03>>=
 my_ugr.tb <- ungroup(my_gr.tb)
@@ -726,7 +726,7 @@ right_join(x = first.tb, y = second.tb)
 right_join(x = second.tb, y = first.tb)
 @
 
-An inner join discards all rows in \code{x} that do not have a matching row in \code{y} and \emph{vice versa}.
+An inner join discards rows in \code{x} that do not match rows in \code{y} and \emph{vice versa}.
 
 <<joins-04>>=
 inner_join(x = first.tb, y = second.tb)
@@ -736,7 +736,7 @@ inner_join(x = first.tb, y = second.tb)
 inner_join(x = second.tb, y = first.tb)
 @
 
-Next we apply the \emph{filtering join}\index{joins between data sources!filtering} functions exported by \pkgname{dplyr}: \Rfunction{semi\_join()} and \Rfunction{anti\_join()}. These functions only return a tibble that always contains only the columns from \code{x}, but retains rows based on their match to rows in \code{y}.
+Next we apply the \emph{filtering join}\index{joins between data sources!filtering} functions exported by \pkgname{dplyr}: \Rfunction{semi\_join()} and \Rfunction{anti\_join()}. These functions only return a tibble that contains only the columns from \code{x}, retaining rows based on their match to rows in \code{y}.
 
 A semi join retains rows from \code{x} that have a match in \code{y}.
 

diff --git a/R.data.containers.Rnw b/R.data.containers.Rnw
@@ -733,7 +733,7 @@ Related to splitting a data frame is the calculation of summaries based on a sub
 To summarise a single variable by group we can use \Rfunction{aggregate()}.
 
 <<faq-aggregate-01>>=
-aggregate(x = iris$Petal.Length, 
+aggregate(x = iris$Petal.Length,
           by = list(iris$Species), FUN = mean)
 @
 
@@ -743,7 +743,7 @@ aggregate(x = iris$Petal.Length,
 To summarise variables we can use \Rfunction{aggregate()} (see section \ref{sec:dplyr:group:wise} on page \pageref{sec:dplyr:group:wise} for an alternative approach using package \pkgnameNI{dplyr}).
 
 <<faq-aggregate-02>>=
-aggregate(x = iris[ , sapply(iris, is.numeric)], 
+aggregate(x = iris[ , sapply(iris, is.numeric)],
           by = list(iris$Species), FUN = mean)
 @
 
@@ -869,15 +869,15 @@ In this example, column \code{A} of \code{df14} takes precedence, and the return
 <<data-frames-EB-12>>=
 df14 <- data.frame(A = 1:10, B = 3)
 df14$C <- with(df14, (A + B) / A) # add column
-head(df14, 2)
+head(df14, 3)
 @
 
 In the case of \Rscoping{within()}, assignments in the argument to its second parameter affect the object returned, which is a copy of the container (In this case, a whole data frame), which still needs to be saved through assignment. Here the intention is to modify it, so we assign it back to the same name, but it could have been assigned to a different name so as not to overwrite the original data frame.
 
 <<data-frames-EB-13>>=
 df14$C <- NULL
 df15 <- within(df14,  C <- (A + B) / A) # midified copy
-head(df15, 2)
+head(df15, 3)
 @
 
 In the example above, using \code{within()} instead of \Rscoping{with()} makes little difference to the amount of typing or clarity of the code, but with multiple member variables being operated upon, as shown below, using \Rscoping{within()} results in more concise and easier to understand code.
@@ -888,7 +888,7 @@ df16 <- within(df14,
                 D <- A * B
                 E <- A / B + 1}
                )
-head(df16, 2)
+head(df16, 3)
 @
 
 \begin{explainbox}
@@ -898,7 +898,7 @@ Repeatedly pre-pending the name of a \emph{container} such as a list or data fra
 df14$C <- (df14$A + df14$B) / df14$A
 df14$D <- df14$A * df14$B
 df14$D <- df14$A / df14$B + 1
-head(df14, 2)
+head(df14, 3)
 @
 
 Using\index{data frames!attaching}\label{par:calc:attach} \Rscoping{attach()} we can alter where \Rlang looks up names and consequently simplify the statement. With \Rscoping{detach()} we can restore the original state. It is important to remember that here we can only simplify the right-hand side of the assignment, while the ``destination'' of the result of the computation still needs to be fully specified on the left-hand side of the assignment operator. We include below only one statement between \Rscoping{attach()} and \Rscoping{detach()} but multiple statements are allowed. Furthermore, if variables with the same name as the columns exist in the search path, these will take precedence, something that can result in bugs or crashes, or as seen below, a message warns that variable \code{A} from the global environment will be used instead of column \code{A} of the attached \code{df17}. The returned value is, of course, not the desired one.

diff --git a/R.data.io.Rnw b/R.data.io.Rnw
@@ -836,7 +836,8 @@ longitude <-  ncvar_get(meteo_data.nc, "lon")
 head(longitude)
 latitude <- ncvar_get(meteo_data.nc, "lat")
 head(latitude)
-@
+@%
+\pagebreak
 
 The \code{time} vector contains only monthly values as the file contains a long-term series of monthly averages, expressed as days from 1800-01-01 corresponding to the first day of each month of year "1". We use package \pkgname{lubridate} for the conversion. To find the indexes for the grid point of interest, it is necessary to study the vectors \code{longitude} and \code{latitude} saved above.
 
@@ -925,12 +926,12 @@ While functions in package \pkgname{readr} support the use of URLs, those in pac
 For portability, \pgrmname{MS-Excel} files should be downloaded in binary mode, setting \code{mode = "wb"}, which is required under \osname{MS-Windows}.
 \end{warningbox}
 
-
 <<url-11, eval=eval_online_data>>=
 download.file("http://r4photobiology.info/learnr/my-data.xlsx",
               "data/my-data-dwn.xlsx",
               mode = "wb")
-@
+@%
+\pagebreak
 
 Functions in package \pkgname{foreign}, as well as those in package \pkgname{haven}, support URLs. See section \ref{sec:files:stat} on page \pageref{sec:files:stat} for more information about importing this kind of data into \Rlang.
 

diff --git a/R.functions.Rnw b/R.functions.Rnw
@@ -198,23 +198,18 @@ After the toy examples above, we will define a small but useful function: a func
 SEM <- function(x){sqrt(var(x) / length(x))}
 @
 
-We can test our function.
+As a test, we call \Rfunction{SEM()} with both \code{a} and \code{a.na} as argument.
 
 <<fun-04>>=
 a <- c(1, 2, 3, -5)
 a.na <- c(a, NA)
 SEM(x = a)
-SEM(a)
-SEM(a.na)
+SEM(x = a.na)
 @
 
-For example, with \code{SEM(a)} we are calling function \Rfunction{SEM()} with \code{a} as an argument.
+Our function \code{SEM(a)} never returns a wrong answer because \code{NA} values in its input always result in \code{NA} being returned. The downside is that unlike \Rlang's functions such as \code{var()}, \Rfunction{SEM()} does not support omitting \code{NA} values.
 
-Our function \code{SEM(a)} never returns a wrong answer because \code{NA} values in its input always result in \code{NA} being returned. The problem is that unlike \Rlang's functions such as \code{var()}, \Rfunction{SEM()} does not support omitting \code{NA} values.
-
-This could be implemented by adding a second parameter \code{na.rm} to the definition of our function and passing its argument to the call to \Rfunction{var()} within the body of \code{SEM()}. However, to avoid returning wrong values we need to make sure \code{NA} values are also removed before counting the number of observations with \code{length()}.
-
-A readable way of implementing this in code is to define the function as follows.
+Adding \code{na.rm} as a second parameter and passing the argument it receives to the call to \Rfunction{var()} within the body of \code{SEM()} is not enough. To avoid returning wrong values, \code{NA} values should be also removed before counting the number of observations with \code{length()}. A good alternative is to define the function as follows.
 
 <<fun-05>>=
 sem <- function(x, na.rm = FALSE) {
@@ -276,7 +271,7 @@ Operators are functions that use a different syntax for being called. If their n
 `/`(e1 = 1 , e2 = 2)
 @
 
-An important consequence of the possibility of calling operators using ordinary syntax is that operators can be used as arguments to \emph{apply} functions in the same way as ordinary functions. When passing operator names as arguments to \emph{apply} functions we only need to enclose them in back ticks (see section \ref{sec:data:apply} on page \pageref{sec:data:apply}).
+\Kern{1}{An important consequence of the possibility of calling operators using ordinary syntax is that operators can be used as arguments to \emph{apply} functions in the same way as ordinary functions. When passing operator names as arguments to \emph{apply} functions we only need to enclose them in back ticks (see section \ref{sec:data:apply} on page \pageref{sec:data:apply}).}
 
 The name by itself and enclosed in back ticks allows us to access the definition of an operator.
 
@@ -381,6 +376,9 @@ A specialised \code{print()} method is not available for \code{"derivclass"}, th
 
 <<explain-object-classes-05>>=
 print(b)
+@
+
+<<explain-object-classes-05a>>=
 print(as.numeric(b))
 @
 \end{explainbox}
@@ -456,7 +454,7 @@ For distribution a single compressed archive file is used for aech package. Pack
 A key repository for bioinformatics with \Rlang is Bioconductor\index{Bioconductor} (\url{https://www.bioconductor.org/}), containing packages that pass strict quality tests, adding an additional 3\,400 packages. rOpenScience\index{rOpenScience} has established guidelines and a system for code peer review for \Rlang packages. These peer-reviewed packages are available through \CRAN or other repositories and listed at the rOpenScience website (\url{https://ropensci.org/}).
 Occasionally one may have or want to install packages directly from Git repositories such as versions still under development and not yet submitted to \CRAN.
 
-One good way of learning how the extensions provided by a package work, is by experimenting with them. When using a function we are not yet familiar with, looking at its help to check all its features will expand your ``toolbox''. How much documentation is included with packages varies, while documentation of exported objects is enforced, many packages include, in addition, comprehensive user guides or articles as \emph{vignettes}. It is not unusual to decide which package to use from a set of alternatives based on the quality of available documentation. In the case of packages adding extensive new functionality, they may be documented in depth in a book. Well-known examples are \citebooktitle{Pinheiro2000} \autocite{Pinheiro2000}, \citebooktitle{Sarkar2008} \autocite{Sarkar2008} and \citebooktitle{Wickham2016} \autocite{Wickham2016}.
+A good way of learning how the extensions provided by a package work, is experimenting with them. When using a function we are not yet familiar with, looking at its help to check all its features expands our ``toolbox''. While documentation of exported objects is enforced, many packages include, in addition, comprehensive user guides or articles as \emph{vignettes}. It is not unusual to decide which package to use from a set of alternatives based its documentation. In the case of packages adding extensive new functionality, they may be documented in depth in a book. Well-known examples are \citebooktitle{Pinheiro2000} \autocite{Pinheiro2000} and \citebooktitle{Wickham2016} \autocite{Wickham2016}.
 
 \subsection{Download, installation and use}\label{sec:packages:install}
 
@@ -470,35 +468,36 @@ The instructions below assume that the user has access to repositories on the in
 \begin{faqbox}{How to install or update a package from CRAN?}
 \CRAN is the default repository for \Rlang packages. If you use \RStudio or another IDE as a front end on any operating system or \pgrmname{RGUI} under \pgrmname{MS-Windows}, installation and updates can be done through a menu or GUI `button'. These menus use calls to \Rfunction{install.packages()} and \Rfunction{update.packages()} behind the scenes.
 
-Alternatively, at the \Rpgrm command line, or in a script, \Rfunction{install.packages()} can called with the name of the package as argument. For example, to install package \pkgname{learnrbook} we use
+Alternatively, at the \Rpgrm command line, or in a script, \Rfunction{install.packages()} can be called with the name of the package as an argument. For example, to install package \pkgname{learnrbook} one can use
 
 <<pkg-00, eval=FALSE>>=
 install.packages("learnrbook")
 @
 
-or alternatively, using package \pkgname{pak}.
+\noindent
+and to update already installed packages
 
-<<pak-01, eval=FALSE>>=
-pak::pkg_install("learnrbook")
+<<pkg-00x, eval=FALSE>>=
+update.packages()
 @
-
-Already installed packages are updated with function \Rfunction{update.packages()}.
 \end{faqbox}
 
-\begin{faqbox}{How to install or update \Rlang package from GitHub?}
-Package \pkgname{remotes} makes it possible to install packages directly from \GitHub, \Bitbucket and other code repositories based on \pgrmname{Git}. The code in the next chunk (not run here) can be used to install the latest, possibly, still under development, version of package \pkgname{learnrbook}.
+\begin{faqbox}{How to install or update a package from GitHub?}
+Package \pkgname{remotes} makes it possible to install packages directly from \GitHub, \Bitbucket and other repositories based on \pgrmname{Git}. The code in the next chunk (not run here) can be used to install the latest, possibly, still under development, version of package \pkgname{learnrbook}.
 
-<<remotes-01, eval=FALSE>>=
+<<remotes-00y, eval=FALSE>>=
 remotes::install_github("aphalo/learnrbook-pkg")
 @
+\end{faqbox}
 
-Alternatively, the newer package \pkgname{pak} can be used.
+\begin{explainbox}
+Function \Rfunction{pkg\_install()} from \pkgname{pak} can install packages, both from CRAN and Bioconductor repositories, and from \pgrmname{Git} repositories. The same function can be used to update specific already installed packages and dependencies.
 
-<<pak-02, eval=FALSE>>=
-pak::pkg_install("aphalo/learnrbook-pkg")
+<<pak-00z, eval=FALSE>>=
+pak::pkg_install("learnrbook") # from CRAN
+pak::pkg_install("aphalo/learnrbook-pkg") # from GitHub
 @
-
-\end{faqbox}
+\end{explainbox}
 
 \Rpgrm packages can be installed either from sources, or from already built ``binaries''. Installing from sources, depending on the package, may require additional software to be available. This is because some \Rlang packages contain source code in other languages such as \Clang, \Cpplang or \langname{FORTRAN} that needs to be compiled into machine code during installation. Under \pgrmname{MS-Windows}, the needed shell, commands and compilers are not available as part of the operating system. Installing them is not difficult as they are available prepackaged in an installer under the name \pgrmname{RTools} (available from \CRAN). \pgrmnameTwo{\hologo{MiKTeX}}{MiKTeX}) is usually needed to build the PDF of the package's manual.
 
@@ -586,13 +585,17 @@ Namespaces isolate the names defined within them from those in other namespaces.
 class(cars)
 head(cars, 3)
 getAnywhere("cars")$where # defined in package
+@
+
+<<pkg-01a, eval=eval_playground>>=
 cars <- 1:10
 class(cars)
 head(cars, 3) # prints 'cars' defined in the global environment
 rm(cars) # clean up
 head(cars, 3)
 getAnywhere("cars")$where # the first visible definition is in the global environemnt
 @
+
 \end{playground}
 
 \begin{warningbox}