Fix errors in index entries

I noticed some errors in the log of makeindex and fixed them. Added script for wordcloud for cover. Removed use of polyglossia as it was triggering errors and the whole book is in English.
aphalo · Nov 23, 2019 · 197853b · 197853b
1 parent 5fbf0eb
commit 197853b
Show file tree

Hide file tree

Showing 24 changed files with 281,833 additions and 282,618 deletions.
diff --git a/CRC/covers/learnrbook-back-cover-and-abstracts.docx b/CRC/covers/learnrbook-back-cover-and-abstracts.docx
diff --git a/CRC/covers/learnrbook-cover-image.png b/CRC/covers/learnrbook-cover-image.png
diff --git a/CRC/make-cover-wordcloud.R b/CRC/make-cover-wordcloud.R
@@ -0,0 +1,63 @@
+library(ngram)
+library(ggplot2)
+library(ggwordcloud)
+library(dplyr)
+library(tidytext)
+library(stringr)
+library(wrapr)
+
+getwd()
+list.files(path = ".", pattern = "*.idx$")
+indexed.words <- multiread(extension=".idx", prune.empty = FALSE)
+
+get_words <- function(x) {
+  # remove laTeX commands
+  gsub("\\\\textsf|\\\\textit|\\\\textsf|\\\\texttt|\\\\indexentry|\\\\textbar|\\\\ldots", "", x) -> temp
+  # replace scaped characters
+  gsub("\\\\_", "_", temp) -> temp
+  gsub('\\\\"|\\"|\"', '"', temp) -> temp
+  gsub("\\\\%", "%", temp) -> temp
+  gsub("\\\\$|\\$", "$", temp) -> temp
+  gsub("\\\\&|\\&", "&", temp) -> temp
+  gsub("\\\\^|\\^", "^", temp) -> temp
+  # remove index catagories
+  gsub("[{]functions and methods!|[{]classes and modes!|[{]data objects!|[{]operators!|[{]control of execution!|[{]names and their scope!|[{]constant and special values!", "", temp) -> temp
+  # remove page numbers
+  gsub("[{][0-9]*[}]", "", temp) -> temp
+  # remove LaTeX formated versions of index entries
+  gsub("@  [{][a-zA-Z_.:0-9$<-]*[(][])][}][}]", "", temp) -> temp
+  gsub("@  [{][-a-zA-Z_.:0-9$<+*/>&^\\]*[}][}]", "", temp) -> temp
+  gsub("@  [{][\\<>.!=, \"%[]*]*[}][}]", "", temp)
+}
+
+assign(sub("./", "", names(indexed.words)[1]), get_words(indexed.words[[1]]))
+
+string.summary(rcatsidx.idx)
+
+str_replace(rcatsidx.idx, "@", "") %.>%
+  str_replace(., '\\"|\\\\"|\"', "") %.>%
+  str_replace(., '\\\\$"', "$") %.>%
+  str_replace(., "^[{]", "") %.>%
+  str_replace(., "[}][}]$", "") %.>%
+  str_split(., " ") %.>%
+  unlist(.) %.>%
+  sort(.) %.>%
+  rle(.) %.>%
+  tibble(lengths = .$lengths, values = .$values) %.>%
+  filter(., !values %in% c("", "NA", "\\$")) %.>%
+  mutate(., values = ifelse(values %in% c("{%in%}}","{%in%}", "%in%@"), "%in%", values)) %.>%
+  mutate(., values = ifelse(values %in% c("{levels()<-}}","{levels()<-}", "levels()<-@"), "%in%", values)) %.>%
+  group_by(., values) %>%
+  summarise(., lengths = sum(lengths)) %>%
+  dplyr::arrange(., desc(lengths)) -> word_counts.tb
+
+nrow(word_counts.tb)
+
+set.seed(42)
+ggplot(word_counts.tb[1:140, ], aes(label = values, size = lengths, color = lengths)) +
+  geom_text_wordcloud(family = "mono", fontface = "bold", area_corr = TRUE) +
+  scale_size_area(max_size = 10) +
+  scale_color_viridis_c() +
+  theme_minimal() +
+  theme(aspect.ratio = 3/4,
+        panel.background = element_rect(fill = "black"))
diff --git a/R.as.calculator.Rnw b/R.as.calculator.Rnw
@@ -569,7 +569,7 @@ We see next that the exponentiation operator \Roperator{\^{}} forces the promoti
 \end{explainbox}
 
 \begin{warningbox}
-\index{comparison of floating point numbers|(}\index{inequality and equality tests|(}\index{loss of numeric precision}\index{}In many situations, when writing programs one should avoid testing for equality of floating point numbers (`floats'). Here we show how to handle gracefully rounding errors. As the example shows, rounding errors may accumulate, and in practice \verb|.Machine$double.eps| is not always a good value to safely use in tests for ``zero'', a larger value may be needed. Whenever possible according to the logic of the calculations, it is best to test for inequalities, for example using \verb|x <= 1.0| instead of \verb|x == 1.0|. If this is not possible, then the tests should be done replacing tests like \verb|x == 1.0| with \verb|abs(x - 1.0) < eps|. Function \Rfunction{abs()} returns the absolute value, in simple words, makes all values positive or zero, by changing the sign of negative values, or in mathematical notation $|x| = |-x|$.
+\index{comparison of floating point numbers|(}\index{inequality and equality tests|(}\index{loss of numeric precision}In many situations, when writing programs one should avoid testing for equality of floating point numbers (`floats'). Here we show how to handle gracefully rounding errors. As the example shows, rounding errors may accumulate, and in practice \verb|.Machine$double.eps| is not always a good value to safely use in tests for ``zero'', a larger value may be needed. Whenever possible according to the logic of the calculations, it is best to test for inequalities, for example using \verb|x <= 1.0| instead of \verb|x == 1.0|. If this is not possible, then the tests should be done replacing tests like \verb|x == 1.0| with \verb|abs(x - 1.0) < eps|. Function \Rfunction{abs()} returns the absolute value, in simple words, makes all values positive or zero, by changing the sign of negative values, or in mathematical notation $|x| = |-x|$.
 
 <<machine-eps-06>>=
 a == 0.0 # may not always work
@@ -874,7 +874,7 @@ a[3] <- b[2]
 @
 \end{explainbox}
 
-\index{type conversion|(}
+\index{type conversion|)}
 
 \section{Vector manipulation}\label{sec:vectors}\label{sec:calc:indexing}
 \index{vectors!indexing|(}\index{vectors!member extraction}
@@ -1411,7 +1411,7 @@ a.list
 @
 
 \subsection{Member extraction and subsetting}
-Using\qRoperator{[[]]}\index{lists!member extraction|(}\index{lists!member indexing|see{lists!member extraction}}\index{lists!indexes|see{lists!member extraction}} double square brackets for indexing a list extracts the element stored in the list, in its original mode, in the example above, \code{a.list[["x"]]} returns a numeric vector, while \code{a.list[1]} returns a list containing the numeric vector \code{x}. \code{a.list\$x} returns the same value as \code{a.list[["x"]]}, a numeric vector. While \code{a.list[c(1,3)]} returns a list of length two, while \code{a.list[[c(1,3)]]} is an error.
+Using\qRoperator{[[]]}\index{lists!member extraction|(}\index{lists!member indexing|see{member extraction}}\index{lists!indexes|see{member extraction}} double square brackets for indexing a list extracts the element stored in the list, in its original mode, in the example above, \code{a.list[["x"]]} returns a numeric vector, while \code{a.list[1]} returns a list containing the numeric vector \code{x}. \code{a.list\$x} returns the same value as \code{a.list[["x"]]}, a numeric vector. While \code{a.list[c(1,3)]} returns a list of length two, while \code{a.list[[c(1,3)]]} is an error.
 
 <<lists-1>>=
 a.list$x

diff --git a/R.data.io.Rnw b/R.data.io.Rnw
@@ -17,7 +17,7 @@ Most programmers have seen them, and most good programmers realize they've writt
 <<echo=FALSE>>=
 # set to TRUE to test non-executed code chunks and rendering of plots
 eval_online_data <- TRUE
-eval_yoctopuce <- TRUE
+eval_yoctopuce <- FALSE
 @
 
 \section{Aims of this chapter}
@@ -437,7 +437,7 @@ Function \Rfunction{read\_delim()} with space as delimiter needs to be used.
 read_delim(file = "extdata/not-aligned-ASCII.txt", delim = " ")
 @
 
-Function \Rfunction{read\_tsv()} reads files with tab character as delimiter, and \Rfunction{read\_fwf())} reads files with fixed width fields. There is, however, no equivalent to \Rfunction{read.fortran()}, supporting implicit decimal points.
+Function \Rfunction{read\_tsv()} reads files with tab character as delimiter, and \Rfunction{read\_fwf()} reads files with fixed width fields. There is, however, no equivalent to \Rfunction{read.fortran()}, supporting implicit decimal points.
 
 \begin{playground}
 Use the "wrong" \code{read\_} functions to read the example files used above and/or your own files. As mentioned earlier forcing errors will help you learn how to diagnose when such errors are caused by coding mistakes. In this case, as wrongly read data are not always accompanied by error or warning messages, carefully check the returned tibbles for misread data values.

diff --git a/R.intro.Rnw b/R.intro.Rnw
@@ -52,7 +52,7 @@ Some languages have been standardised, and their grammar has been formally defin
 \end{explainbox}
 
 \subsection{R as a computer program}
-\index{R as a computer program@{\textsf{R} as a computer program}}
+\index{R as a computer program@{\Rpgrm as a computer program}}
 \index{Windows@{\textsf{Windows}}|see{MS-Windows@{\textsf{MS-Windows}}}}
 The \Rpgrm program itself is open-source, the source code is available for anybody to inspect, modify and use. A small fraction of users will directly contribute improvements to the \Rpgrm program itself, but it is possible, and those contributions are important in making \Rpgrm reliable. The executable, the \Rpgrm program we actually use, can be built for different operating systems and computer hardware. The members of the \Rpgrm developing team make an important effort to keep the results obtained from calculations done on all the different builds and computer architectures as consistent as possible. The aim is to ensure that computations return consistent results not only across updates to \Rpgrm but also across different operating systems like \osname{Linux}, \osname{Unix} (including \osname{OS X}), and \osname{MS-Windows}, and computer hardware.
 

diff --git a/R.plotting.Rnw b/R.plotting.Rnw
@@ -78,7 +78,7 @@ eval_plots_all <- FALSE
 @
 
 \section{Introduction to the grammar of graphics}
-\index{!elements|(}
+\index{grammar of graphics!elements|(}
 What separates \ggplot from base \Rlang and trellis/lattice plotting functions is the use of a grammar of graphics\index{grammar of graphics} (the reason behind `gg' in the name of package \pkgname{ggplot2}). What is meant by grammar in this case is that plots are assembled piece by piece using different `nouns' and `verbs' \autocite{Cleveland1985}. Instead of using a single function with many arguments, plots are assembled by combining different elements with operators \code{+} and \verb|%+%|. Furthermore, the construction is mostly semantic-based and to a large extent how plots look when is printed, displayed or exported to a bitmap or vector-graphics file is controlled by themes.
 
 We can think of plotting as representing the observations or data in a graphical language. We use the properties of graphical objects to represent different aspects of our data. An observation can consist in multiple values recorded. Say an observation of air temperature may be defined by a position in 3-dimensional space and a point in time, in addition to the temperature itself. An observation for the size and shape of a plant can consist in height, stem diameter, number of leaves, size of individual leaves, length of roots, fresh mass, dry mass, etc. If we are interested in the relationship between height and stem diameter, we may want to use cartesian coordinates\index{grammar of graphics!cartesian coordinates}, \emph{mapping} stem diameter to the $x$ dimension of the plot and the height to the $y$ dimension. The observations could be represented on the plot by points and/or joined by lines.
@@ -1247,7 +1247,6 @@ Package \pkgname{ggpmisc} provides additional \emph{statistics} for the annotati
 
 \subsection{Frequencies and counts}\label{sec:histogram}\label{sec:plot:histogram}
 \index{plots!histograms|(}
-\index{density plots|(}
 
 A different type of summaries are frequencies and empirical density functions. These can be calculated in one or more dimensions. Sometimes instead of being calculated, we rely on the density of graphical elements to convey the density of the observations. For example, scatter plots using well chosen values for \code{alpha} can give a satisfactory impression of the density. Rug plots, described in section \ref{sec:plot:rug} on page \pageref{sec:plot:rug}, can also satisfactorily to convey the density of observations along $x$ and/or $y$ axes. Such approaches do not involve computations, while the \emph{statistics} described in this section do.
 
@@ -1539,7 +1538,7 @@ opts_chunk$set(opts_fig_wide)
 \index{grammar of graphics!facets|)}
 
 \section{Scales}\label{sec:plot:scales}
-\index{grammar of graphics!aesthetic scales|(}
+\index{grammar of graphics!scales|(}
 
 In earlier sections of this chapter examples have used the default \emph{scales} or we have set them with convenience functions. In the present section we describe in more details the use of \emph{scales}. There are \emph{scales} available for different \emph{aesthetics} ($\approx$ attributes) of the plotted geometrical objects, such as position (\code{x, y, z}), \code{size}, \code{shape}, \code{linetype}, \code{colour}, \code{fill}, \code{alpha} or transparency, \code{angle}. Scales determine how values in \code{data} are mapped to values of an \emph{aesthetics}, and how these values are labelled.
 
@@ -1566,6 +1565,7 @@ A continuous data variable needs to be mapped to an \emph{aesthetic} through a c
 \index{plots!title|(}
 \index{plots!subtitle|(}
 \index{plots!tag|(}
+\index{plots!caption|(}
 First we describe a feature common to all scales, their \code{name}. The default \code{name} of all scales is the name of the variable or the expression mapped to it. In the case of the \code{x}, \code{y} and \code{z} \emph{aesthetics} the \code{name} given to the scale is used for the axis labels. For other \emph{aesthetics} the name of the scale becomes the ``heading'' or \emph{key title} of the guide or key. All scales have a \code{name} parameter to which a character string or \Rlang expression (see section \ref{sec:plot:plotmath}) can be passed as argument to override the default.
 
 Whole-plot title, subtitle and caption are not connected to \emph{scales} or \code{data}. A title (\code{label}) and \code{subtitle} can be added least confusingly with function \Rfunction{ggtitle()} by passing either character strings or \Rlang expressions as arguments.
@@ -1717,7 +1717,7 @@ ggplot(fake2.data, aes(z, y)) + geom_point() +
 
 \subsubsection{Ticks and their labels}\label{sec:plot:scales:ticks}
 
-Parameter \code{breaks}\index{plots!scales!continuous!tick breaks} is used to set the location of ticks along the axis. Parameter \code{labels}\index{plots!scales!continuous!tick labels} is used to set the tick labels. Both parameters can be passed either a vector or a function as argument. The default is to compute ``good'' breaks based on the limits and format the numbers as strings.
+Parameter \code{breaks}\index{plots!scales!tick breaks} is used to set the location of ticks along the axis. Parameter \code{labels}\index{plots!scales!tick labels} is used to set the tick labels. Both parameters can be passed either a vector or a function as argument. The default is to compute ``good'' breaks based on the limits and format the numbers as strings.
 
 When manually setting breaks, we can keep the default computed labels for the \code{breaks}.
 
@@ -1756,7 +1756,7 @@ In the case of currency we can use \code{scales::dollar()}, to use commas to sep
 
 \subsubsection{Transformed scales}\label{sec:plot:scales:trans}
 
-The\index{plots!scales!continuous!transformations} default scales used by the \code{x} and \code{y} aesthetics, \ggscale{scale\_x\_continuous()} and \ggscale{scale\_y\_continuous()}, accept a user-supplied transformation function as argument to \code{trans} with default code{trans = "identity"} (no transformation). In addition there are predefined convenience scale functions for \code{log10}, \code{sqrt} and \code{reverse}.
+The\index{plots!scales!transformations} default scales used by the \code{x} and \code{y} aesthetics, \ggscale{scale\_x\_continuous()} and \ggscale{scale\_y\_continuous()}, accept a user-supplied transformation function as argument to \code{trans} with default code{trans = "identity"} (no transformation). In addition there are predefined convenience scale functions for \code{log10}, \code{sqrt} and \code{reverse}.
 
 \begin{warningbox}
   Similarly to the maths functions of R, the name of the scales are \ggscale{scale\_x\_log10()} and \ggscale{scale\_y\_log10()} rather than \ggscale{scale\_y\_log()} because in R the function \code{log} returns the natural or Neperian logarithm.
@@ -1858,7 +1858,7 @@ ggplot(data = weather_wk_25_2019.tb,
   expand_limits(y = 0)
 @
 
-By\index{plots!scales!time!axis labels} default the tick labels produced and their formatting is automatically selected based on the extent of the time data. For example, if we have all data collected within a single day, then the tick labels will show hours and minutes. If we plot data for several years, the labels will show the date portion of the time instant. The default is frequently good enough, but it is possible, as for numbers to use different formatter functions to generate the tick labels.
+By\index{plots!scales!axis labels} default the tick labels produced and their formatting is automatically selected based on the extent of the time data. For example, if we have all data collected within a single day, then the tick labels will show hours and minutes. If we plot data for several years, the labels will show the date portion of the time instant. The default is frequently good enough, but it is possible, as for numbers to use different formatter functions to generate the tick labels.
 
 <<scale-datetime-02>>=
 ggplot(data = weather_wk_25_2019.tb,
@@ -2267,7 +2267,7 @@ old_theme <- theme_update(text = element_text(color = "darkred"))
 <<themes-16, eval=eval_plots_all, echo=FALSE>>=
 theme_set(old_theme)
 @
-\index{grammar of graphics!incomplete themes|(}
+\index{grammar of graphics!incomplete themes|)}
 
 \subsection{Defining a new theme}
 \index{grammar of graphics!creating a theme|(}
@@ -2582,7 +2582,7 @@ bw_ggplot(data = mtcars,
 \index{devices!output|see{graphic output devices}}
 \index{plots!saving to file|see{plots, rendering}}
 \index{graphic output devices|(}
-\index{plots!rendering(}
+\index{plots!rendering|(}
 It is possible, when using \RStudio, to directly export the displayed plot to a file using a menu. However, if the file will have to be generated again at a later time, or a series of plots need to be produced with consistent format, it is best to include the commands to export the plot in the script.
 
 In \Rlang,\index{plots!printing}\index{plots!saving}\index{plots!output to files} files are created by printing to different devices. Printing is directed to a currently open device such a window in \RStudio. Some devices produce screen output, others files. Devices depend on drivers. There are both devices that are part of \Rlang and additional ones defined in contributed packages.

diff --git a/R.stats.rnw b/R.stats.rnw
@@ -21,7 +21,7 @@ This chapter aims to give the reader only a quick introduction to statistics in
 %\emph{At present I use several examples adapted from the help pages for the functions described. I may revise this before publication.}
 
 \section{Statistical summaries}
-\index{functions!built-in|see@{functions, base-R}}%
+\index{functions!built-in|see {functions, base-R}}%
 \index{functions!base R}\index{summaries!statistical}
 Being the main focus of the \Rlang language in data analysis and statistics, it provides functions for both simple and complex calculations, going from means and variances to fitting very complex models. Below are examples of functions implementing the calculation of the frequently used data summaries mean or average (\Rfunction{mean()}), variance (\Rfunction{var()}), standard deviation (\Rfunction{sd()}), median (\Rfunction{median()}), mean absolute deviation (\Rfunction{mad()}), mode (\Rfunction{mode()}), maximum (\Rfunction{max()}), minimum (\Rfunction{min()}), range (\Rfunction{range()}), quantiles (\Rfunction{quantile()}), length (\Rfunction{length()}), and all-encompassing summaries (\Rfunction{summary()}). All these methods accept numeric vectors and matrices as argument. Some of them also have definitions for other classes such as data frames in the case of \Rfunction{summary()}. (The \Rlang language does not define a function for calculation of the standard error of the mean. Please, see section \ref{sec:functions:sem} on page \pageref{sec:functions:sem} for how to define your own.)