Skip to content

Commit

Permalink
Typesetting adjustments 2
Browse files Browse the repository at this point in the history
A few edits to text, including correction of errors.
  • Loading branch information
aphalo committed Jan 6, 2024
1 parent 1e3a20d commit 6eca12d
Show file tree
Hide file tree
Showing 30 changed files with 290,379 additions and 290,123 deletions.
14 changes: 7 additions & 7 deletions R.data.Rnw
Original file line number Diff line number Diff line change
Expand Up @@ -553,9 +553,8 @@ long_iris.tb |>
long_iris.tb
@

In the next few chunks, we print the returned values rather than saving them in variables. In normal use, one would combine these functions into a pipe using operator \Roperator{\textbar >}.

Function \Rfunction{arrange()} is used for sorting the rows---makes sorting a data frame or tibble simpler than by using \Rfunction{sort()} and \Rfunction{order()}. Here, we sort the tibble \code{long\_iris.tb} based on the values in three of its columns.
In the next few chunks, returned values are displayed, while in normal use they would assigned to variables or passed to the next function in a pipe using \Roperator{\textbar >}.
Function \Rfunction{arrange()} is used to sort rows---it makes sorting a data frame or tibble simpler than when using \Rfunction{sort()} or \Rfunction{order()}. Below, \code{long\_iris.tb} rows are sorted based on the values in three of its columns.

<<tidy-tibble-03>>=
arrange(long_iris.tb, Species, plant_part, part_dimension)
Expand Down Expand Up @@ -620,7 +619,8 @@ tibble(numbers = 1:9, Letters = rep(letters[1:3], 3)) |>
median_num = median(numbers),
n = n()) |>
ungroup() # not always needed but safer
@
@%
\pagebreak

In the non-persistent grouping approach, we specify the grouping in the call to \Rfunction{summarise()} (this new feature is labelled as experimental in \pkgname{dplyr} version 1.1.3, and may change in future versions).

Expand Down Expand Up @@ -659,7 +659,7 @@ names(attributes(my_gr.tb))
setdiff(attributes(my_gr.tb), attributes(my.tb))
@

A call to \Rfunction{ungroup()} removes the grouping, restoring the original tibble..
A call to \Rfunction{ungroup()} removes the grouping, restoring the original tibble.

<<tibble-grouped-box-03>>=
my_ugr.tb <- ungroup(my_gr.tb)
Expand Down Expand Up @@ -726,7 +726,7 @@ right_join(x = first.tb, y = second.tb)
right_join(x = second.tb, y = first.tb)
@

An inner join discards all rows in \code{x} that do not have a matching row in \code{y} and \emph{vice versa}.
An inner join discards rows in \code{x} that do not match rows in \code{y} and \emph{vice versa}.

<<joins-04>>=
inner_join(x = first.tb, y = second.tb)
Expand All @@ -736,7 +736,7 @@ inner_join(x = first.tb, y = second.tb)
inner_join(x = second.tb, y = first.tb)
@

Next we apply the \emph{filtering join}\index{joins between data sources!filtering} functions exported by \pkgname{dplyr}: \Rfunction{semi\_join()} and \Rfunction{anti\_join()}. These functions only return a tibble that always contains only the columns from \code{x}, but retains rows based on their match to rows in \code{y}.
Next we apply the \emph{filtering join}\index{joins between data sources!filtering} functions exported by \pkgname{dplyr}: \Rfunction{semi\_join()} and \Rfunction{anti\_join()}. These functions only return a tibble that contains only the columns from \code{x}, retaining rows based on their match to rows in \code{y}.

A semi join retains rows from \code{x} that have a match in \code{y}.

Expand Down
12 changes: 6 additions & 6 deletions R.data.containers.Rnw
Original file line number Diff line number Diff line change
Expand Up @@ -733,7 +733,7 @@ Related to splitting a data frame is the calculation of summaries based on a sub
To summarise a single variable by group we can use \Rfunction{aggregate()}.

<<faq-aggregate-01>>=
aggregate(x = iris$Petal.Length,
aggregate(x = iris$Petal.Length,
by = list(iris$Species), FUN = mean)
@

Expand All @@ -743,7 +743,7 @@ aggregate(x = iris$Petal.Length,
To summarise variables we can use \Rfunction{aggregate()} (see section \ref{sec:dplyr:group:wise} on page \pageref{sec:dplyr:group:wise} for an alternative approach using package \pkgnameNI{dplyr}).

<<faq-aggregate-02>>=
aggregate(x = iris[ , sapply(iris, is.numeric)],
aggregate(x = iris[ , sapply(iris, is.numeric)],
by = list(iris$Species), FUN = mean)
@

Expand Down Expand Up @@ -869,15 +869,15 @@ In this example, column \code{A} of \code{df14} takes precedence, and the return
<<data-frames-EB-12>>=
df14 <- data.frame(A = 1:10, B = 3)
df14$C <- with(df14, (A + B) / A) # add column
head(df14, 2)
head(df14, 3)
@

In the case of \Rscoping{within()}, assignments in the argument to its second parameter affect the object returned, which is a copy of the container (In this case, a whole data frame), which still needs to be saved through assignment. Here the intention is to modify it, so we assign it back to the same name, but it could have been assigned to a different name so as not to overwrite the original data frame.

<<data-frames-EB-13>>=
df14$C <- NULL
df15 <- within(df14, C <- (A + B) / A) # midified copy
head(df15, 2)
head(df15, 3)
@

In the example above, using \code{within()} instead of \Rscoping{with()} makes little difference to the amount of typing or clarity of the code, but with multiple member variables being operated upon, as shown below, using \Rscoping{within()} results in more concise and easier to understand code.
Expand All @@ -888,7 +888,7 @@ df16 <- within(df14,
D <- A * B
E <- A / B + 1}
)
head(df16, 2)
head(df16, 3)
@

\begin{explainbox}
Expand All @@ -898,7 +898,7 @@ Repeatedly pre-pending the name of a \emph{container} such as a list or data fra
df14$C <- (df14$A + df14$B) / df14$A
df14$D <- df14$A * df14$B
df14$D <- df14$A / df14$B + 1
head(df14, 2)
head(df14, 3)
@

Using\index{data frames!attaching}\label{par:calc:attach} \Rscoping{attach()} we can alter where \Rlang looks up names and consequently simplify the statement. With \Rscoping{detach()} we can restore the original state. It is important to remember that here we can only simplify the right-hand side of the assignment, while the ``destination'' of the result of the computation still needs to be fully specified on the left-hand side of the assignment operator. We include below only one statement between \Rscoping{attach()} and \Rscoping{detach()} but multiple statements are allowed. Furthermore, if variables with the same name as the columns exist in the search path, these will take precedence, something that can result in bugs or crashes, or as seen below, a message warns that variable \code{A} from the global environment will be used instead of column \code{A} of the attached \code{df17}. The returned value is, of course, not the desired one.
Expand Down
7 changes: 4 additions & 3 deletions R.data.io.Rnw
Original file line number Diff line number Diff line change
Expand Up @@ -836,7 +836,8 @@ longitude <- ncvar_get(meteo_data.nc, "lon")
head(longitude)
latitude <- ncvar_get(meteo_data.nc, "lat")
head(latitude)
@
@%
\pagebreak

The \code{time} vector contains only monthly values as the file contains a long-term series of monthly averages, expressed as days from 1800-01-01 corresponding to the first day of each month of year "1". We use package \pkgname{lubridate} for the conversion. To find the indexes for the grid point of interest, it is necessary to study the vectors \code{longitude} and \code{latitude} saved above.

Expand Down Expand Up @@ -925,12 +926,12 @@ While functions in package \pkgname{readr} support the use of URLs, those in pac
For portability, \pgrmname{MS-Excel} files should be downloaded in binary mode, setting \code{mode = "wb"}, which is required under \osname{MS-Windows}.
\end{warningbox}


<<url-11, eval=eval_online_data>>=
download.file("http://r4photobiology.info/learnr/my-data.xlsx",
"data/my-data-dwn.xlsx",
mode = "wb")
@
@%
\pagebreak

Functions in package \pkgname{foreign}, as well as those in package \pkgname{haven}, support URLs. See section \ref{sec:files:stat} on page \pageref{sec:files:stat} for more information about importing this kind of data into \Rlang.

Expand Down
53 changes: 28 additions & 25 deletions R.functions.Rnw
Original file line number Diff line number Diff line change
Expand Up @@ -198,23 +198,18 @@ After the toy examples above, we will define a small but useful function: a func
SEM <- function(x){sqrt(var(x) / length(x))}
@

We can test our function.
As a test, we call \Rfunction{SEM()} with both \code{a} and \code{a.na} as argument.

<<fun-04>>=
a <- c(1, 2, 3, -5)
a.na <- c(a, NA)
SEM(x = a)
SEM(a)
SEM(a.na)
SEM(x = a.na)
@

For example, with \code{SEM(a)} we are calling function \Rfunction{SEM()} with \code{a} as an argument.
Our function \code{SEM(a)} never returns a wrong answer because \code{NA} values in its input always result in \code{NA} being returned. The downside is that unlike \Rlang's functions such as \code{var()}, \Rfunction{SEM()} does not support omitting \code{NA} values.

Our function \code{SEM(a)} never returns a wrong answer because \code{NA} values in its input always result in \code{NA} being returned. The problem is that unlike \Rlang's functions such as \code{var()}, \Rfunction{SEM()} does not support omitting \code{NA} values.

This could be implemented by adding a second parameter \code{na.rm} to the definition of our function and passing its argument to the call to \Rfunction{var()} within the body of \code{SEM()}. However, to avoid returning wrong values we need to make sure \code{NA} values are also removed before counting the number of observations with \code{length()}.

A readable way of implementing this in code is to define the function as follows.
Adding \code{na.rm} as a second parameter and passing the argument it receives to the call to \Rfunction{var()} within the body of \code{SEM()} is not enough. To avoid returning wrong values, \code{NA} values should be also removed before counting the number of observations with \code{length()}. A good alternative is to define the function as follows.

<<fun-05>>=
sem <- function(x, na.rm = FALSE) {
Expand Down Expand Up @@ -276,7 +271,7 @@ Operators are functions that use a different syntax for being called. If their n
`/`(e1 = 1 , e2 = 2)
@

An important consequence of the possibility of calling operators using ordinary syntax is that operators can be used as arguments to \emph{apply} functions in the same way as ordinary functions. When passing operator names as arguments to \emph{apply} functions we only need to enclose them in back ticks (see section \ref{sec:data:apply} on page \pageref{sec:data:apply}).
\Kern{1}{An important consequence of the possibility of calling operators using ordinary syntax is that operators can be used as arguments to \emph{apply} functions in the same way as ordinary functions. When passing operator names as arguments to \emph{apply} functions we only need to enclose them in back ticks (see section \ref{sec:data:apply} on page \pageref{sec:data:apply}).}

The name by itself and enclosed in back ticks allows us to access the definition of an operator.

Expand Down Expand Up @@ -381,6 +376,9 @@ A specialised \code{print()} method is not available for \code{"derivclass"}, th

<<explain-object-classes-05>>=
print(b)
@

<<explain-object-classes-05a>>=
print(as.numeric(b))
@
\end{explainbox}
Expand Down Expand Up @@ -456,7 +454,7 @@ For distribution a single compressed archive file is used for aech package. Pack
A key repository for bioinformatics with \Rlang is Bioconductor\index{Bioconductor} (\url{https://www.bioconductor.org/}), containing packages that pass strict quality tests, adding an additional 3\,400 packages. rOpenScience\index{rOpenScience} has established guidelines and a system for code peer review for \Rlang packages. These peer-reviewed packages are available through \CRAN or other repositories and listed at the rOpenScience website (\url{https://ropensci.org/}).
Occasionally one may have or want to install packages directly from Git repositories such as versions still under development and not yet submitted to \CRAN.

One good way of learning how the extensions provided by a package work, is by experimenting with them. When using a function we are not yet familiar with, looking at its help to check all its features will expand your ``toolbox''. How much documentation is included with packages varies, while documentation of exported objects is enforced, many packages include, in addition, comprehensive user guides or articles as \emph{vignettes}. It is not unusual to decide which package to use from a set of alternatives based on the quality of available documentation. In the case of packages adding extensive new functionality, they may be documented in depth in a book. Well-known examples are \citebooktitle{Pinheiro2000} \autocite{Pinheiro2000}, \citebooktitle{Sarkar2008} \autocite{Sarkar2008} and \citebooktitle{Wickham2016} \autocite{Wickham2016}.
A good way of learning how the extensions provided by a package work, is experimenting with them. When using a function we are not yet familiar with, looking at its help to check all its features expands our ``toolbox''. While documentation of exported objects is enforced, many packages include, in addition, comprehensive user guides or articles as \emph{vignettes}. It is not unusual to decide which package to use from a set of alternatives based its documentation. In the case of packages adding extensive new functionality, they may be documented in depth in a book. Well-known examples are \citebooktitle{Pinheiro2000} \autocite{Pinheiro2000} and \citebooktitle{Wickham2016} \autocite{Wickham2016}.

\subsection{Download, installation and use}\label{sec:packages:install}

Expand All @@ -470,35 +468,36 @@ The instructions below assume that the user has access to repositories on the in
\begin{faqbox}{How to install or update a package from CRAN?}
\CRAN is the default repository for \Rlang packages. If you use \RStudio or another IDE as a front end on any operating system or \pgrmname{RGUI} under \pgrmname{MS-Windows}, installation and updates can be done through a menu or GUI `button'. These menus use calls to \Rfunction{install.packages()} and \Rfunction{update.packages()} behind the scenes.

Alternatively, at the \Rpgrm command line, or in a script, \Rfunction{install.packages()} can called with the name of the package as argument. For example, to install package \pkgname{learnrbook} we use
Alternatively, at the \Rpgrm command line, or in a script, \Rfunction{install.packages()} can be called with the name of the package as an argument. For example, to install package \pkgname{learnrbook} one can use

<<pkg-00, eval=FALSE>>=
install.packages("learnrbook")
@

or alternatively, using package \pkgname{pak}.
\noindent
and to update already installed packages

<<pak-01, eval=FALSE>>=
pak::pkg_install("learnrbook")
<<pkg-00x, eval=FALSE>>=
update.packages()
@

Already installed packages are updated with function \Rfunction{update.packages()}.
\end{faqbox}

\begin{faqbox}{How to install or update \Rlang package from GitHub?}
Package \pkgname{remotes} makes it possible to install packages directly from \GitHub, \Bitbucket and other code repositories based on \pgrmname{Git}. The code in the next chunk (not run here) can be used to install the latest, possibly, still under development, version of package \pkgname{learnrbook}.
\begin{faqbox}{How to install or update a package from GitHub?}
Package \pkgname{remotes} makes it possible to install packages directly from \GitHub, \Bitbucket and other repositories based on \pgrmname{Git}. The code in the next chunk (not run here) can be used to install the latest, possibly, still under development, version of package \pkgname{learnrbook}.

<<remotes-01, eval=FALSE>>=
<<remotes-00y, eval=FALSE>>=
remotes::install_github("aphalo/learnrbook-pkg")
@
\end{faqbox}

Alternatively, the newer package \pkgname{pak} can be used.
\begin{explainbox}
Function \Rfunction{pkg\_install()} from \pkgname{pak} can install packages, both from CRAN and Bioconductor repositories, and from \pgrmname{Git} repositories. The same function can be used to update specific already installed packages and dependencies.

<<pak-02, eval=FALSE>>=
pak::pkg_install("aphalo/learnrbook-pkg")
<<pak-00z, eval=FALSE>>=
pak::pkg_install("learnrbook") # from CRAN
pak::pkg_install("aphalo/learnrbook-pkg") # from GitHub
@

\end{faqbox}
\end{explainbox}

\Rpgrm packages can be installed either from sources, or from already built ``binaries''. Installing from sources, depending on the package, may require additional software to be available. This is because some \Rlang packages contain source code in other languages such as \Clang, \Cpplang or \langname{FORTRAN} that needs to be compiled into machine code during installation. Under \pgrmname{MS-Windows}, the needed shell, commands and compilers are not available as part of the operating system. Installing them is not difficult as they are available prepackaged in an installer under the name \pgrmname{RTools} (available from \CRAN). \pgrmnameTwo{\hologo{MiKTeX}}{MiKTeX}) is usually needed to build the PDF of the package's manual.

Expand Down Expand Up @@ -586,13 +585,17 @@ Namespaces isolate the names defined within them from those in other namespaces.
class(cars)
head(cars, 3)
getAnywhere("cars")$where # defined in package
@

<<pkg-01a, eval=eval_playground>>=
cars <- 1:10
class(cars)
head(cars, 3) # prints 'cars' defined in the global environment
rm(cars) # clean up
head(cars, 3)
getAnywhere("cars")$where # the first visible definition is in the global environemnt
@

\end{playground}

\begin{warningbox}
Expand Down
Loading

0 comments on commit 6eca12d

Please sign in to comment.