Skip to content

Commit

Permalink
Proof Ch 10
Browse files Browse the repository at this point in the history
  • Loading branch information
aphalo committed Feb 17, 2024
1 parent 51686de commit 361105e
Show file tree
Hide file tree
Showing 438 changed files with 54,231 additions and 54,542 deletions.
Binary file modified CRC-2nd-ed/proofs/9781032518435_prelims.docx
Binary file not shown.
Binary file added R-genrated-figures.zip
Binary file not shown.
2 changes: 1 addition & 1 deletion R.data.Rnw
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,7 @@ The authors of package \pkgname{tibble} describe their \Rclass{tbl} class as nea

The class and methods that package \pkgname{tibble} defines lift some of the restrictions imposed by the design of base \Rlang data frames at the cost of creating some incompatibilities due to changed (improved) syntax for member extraction. Tibbles simplify the creation of ``columns'' of class \Rclass{list} and remove support for columns of class \Rclass{matrix}. Handling of attributes is also different, with no row names added by default. There are also differences in default behaviour of both constructors and methods.

\emph{Although, objects of class \Rclass{tbl} can be passed as arguments to functions that expect data frames as input, these functions are not guaranteed to work correctly with tibbles as a result of the differences in syntax of some methods.}
\emph{Although, objects of class \Rclass{tbl} can be passed as arguments to functions that expect data frames as input, these functions are not guaranteed to work correctly with tibbles as a result of the differences in behaviour of some methods and operators.}

\begin{warningbox}
It is easy to write code that will work correctly both with data frames and tibbles by avoiding constructs that behave differently. However, code that is syntactically correct according to the \Rlang language may fail to work as expected if a tibble is used in place of a data frame. Only functions tested to work correctly with both tibbles and data frames can be relied upon as compatible.
Expand Down
4 changes: 2 additions & 2 deletions R.data.containers.Rnw
Original file line number Diff line number Diff line change
Expand Up @@ -693,7 +693,7 @@ Although in this last example we used numeric indexes to make it more interestin
\end{warningbox}

\subsection{Summarising and splitting data frames}\label{sec:calc:df:split}\label{sec:calc:df:aggregate}
Function\index{data frames!summarizing} \Rfunction{summary()} can be used to obtain a summary from objects of most \Rlang classes, including data frames. It is also possible to use \Rloop{sapply()}, \Rloop{lapply()} or \Rloop{vapply()} to apply any suitable function to data by columns (see section \ref{sec:data:apply} on page \pageref{sec:data:apply} for a description of these functions and their use).
Function\index{data frames!summarising} \Rfunction{summary()} can be used to obtain a summary from objects of most \Rlang classes, including data frames. It is also possible to use \Rloop{sapply()}, \Rloop{lapply()} or \Rloop{vapply()} to apply any suitable function to data by columns (see section \ref{sec:data:apply} on page \pageref{sec:data:apply} for a description of these functions and their use).

<<data-frames-7aaa>>=
summary(df8)
Expand Down Expand Up @@ -751,7 +751,7 @@ For these data, as the only non-numeric variable is \code{Species}, we could hav
\end{faqbox}

\begin{explainbox}
There\index{data frames!summarizing} is also a formula-based \Rfunction{aggregate()} method (or ``variant'') available (\Rlang \emph{formulas} are described in depth in section \ref{sec:stat:formulas} on page \pageref{sec:stat:formulas}). In \Rfunction{aggregate()}, the left-hand side (\emph{lhs}) of the formula indicates the variable to summarise and its right-hand side (\emph{rhs}) the factor used to split or group the data before summarising them.
There\index{data frames!summarising} is also a formula-based \Rfunction{aggregate()} method (or ``variant'') available (\Rlang \emph{formulas} are described in depth in section \ref{sec:stat:formulas} on page \pageref{sec:stat:formulas}). In \Rfunction{aggregate()}, the left-hand side (\emph{lhs}) of the formula indicates the variable to summarise and its right-hand side (\emph{rhs}) the factor used to split or group the data before summarising them.

<<data-frames-7d>>=
aggregate(x1 ~ z, FUN = mean, data = df10)
Expand Down
102 changes: 50 additions & 52 deletions R.data.io.Rnw

Large diffs are not rendered by default.

12 changes: 6 additions & 6 deletions R.plotting.Rnw
Original file line number Diff line number Diff line change
Expand Up @@ -79,15 +79,15 @@ eval_plots_all <- FALSE
@

\section{The Components of a Plot}
I\index{data visualization!concepts} start by briefly presenting concepts central to data visualisation, following the \citetitle{Koponen2019} \autocite{Koponen2019}. Plots are a medium used to convey information, like text. It is worthwhile keeping this in mind. As with text, the design of plots needs to consider what needs to be highlighted to convey the take home message. The style of the plot should match the expectations and the plot-reading abilities of the expected audience. One needs to be careful to avoid ambiguities and most importantly of all not to miss-inform. Data visualisations like text need to be planned, revised, commented upon, and revised again until the best way of expressing our message is found. The flexibility of the grammar of graphics supports very well this approach to designing and producing high quality data visualisations for different audiences.
I\index{data visualisation!concepts} start by briefly presenting concepts central to data visualisation, following the \citetitle{Koponen2019} \autocite{Koponen2019}. Plots are a medium used to convey information, like text. It is worthwhile keeping this in mind. As with text, the design of plots needs to consider what needs to be highlighted to convey the take home message. The style of the plot should match the expectations and the plot-reading abilities of the expected audience. One needs to be careful to avoid ambiguities and most importantly of all not to miss-inform. Data visualisations like text need to be planned, revised, commented upon, and revised again until the best way of expressing our message is found. The flexibility of the grammar of graphics supports very well this approach to designing and producing high quality data visualisations for different audiences.

Of course, when exploring data, fancy details of graphical design are irrelevant, but flexibility remains important as it makes it possible to look at data from many differing angles, highlighting different aspects of them. In the same way as boiler-plate text and text templates have specific but limited uses, all-in-one functions for producing plots do not support well the design of original data visualisations. They tend to get the job done, but lack the flexibility needed to do the best job of communicating information. Being this a book about languages, the focus of this chapter is in the layered grammar of graphics.

The plots described in this chapter are classified as \emph{statistical graphics}\index{statistical graphics} within the broader field of data visualisation. Plots such as scatter plots include points (geometric objects) that by their position, shape, colour, or some other property directly convey information. The location of these points in the plot ``canvas'' or ``plotting area'', given by the values of their $x$ and $y$ coordinates describes properties of the data and any deviation in the mapping of observations to coordinates is misleading, because deviations from the expected mapping conveys wrong/false information to the audience.

A \emph{data label}\index{data visualization!data labels} is connected to an observation but its position can be displaced as long as its link to the corresponding observation can be inferred, e.g., by the direction of an arrow or even simple proximity. Data labels provide ancillary information, such as the name of a gene or place.
A \emph{data label}\index{data visualisation!data labels} is connected to an observation but its position can be displaced as long as its link to the corresponding observation can be inferred, e.g., by the direction of an arrow or even simple proximity. Data labels provide ancillary information, such as the name of a gene or place.

\emph{Annotations}\index{data visualization!annotations}, are additions to a plot that have no connection to individual observations, but rather with all observations taken together, e.g., a text like $n = 200$ indicating the number of observations, usually included in a corner or margin of a plot free of observations.
\emph{Annotations}\index{data visualisation!annotations}, are additions to a plot that have no connection to individual observations, but rather with all observations taken together, e.g., a text like $n = 200$ indicating the number of observations, usually included in a corner or margin of a plot free of observations.

Axis and tick labels, legends and keys make it possible for the reader to retrieve the original values represented in the plot as graphical elements. Other features of visualisations even when not carrying additional information affect the easy with which a plot can be read and accessibility to readers with visual constraints such as colour blindness. These features include the size of text and symbols, thickness of lines, choice of font face, choice of colour palette, etc.

Expand Down Expand Up @@ -2767,9 +2767,9 @@ It is relatively common to use inset tables, plots, bitmaps, or vector graphics
p <- ggplot(data = fake2.data, mapping = aes(x = z, y = y)) +
geom_point()
p + expand_limits(x = 40) +
annotation_custom(ggplotGrob(p + coord_cartesian(xlim = c(5, 10), ylim = c(20, 40)) +
annotation_custom(ggplotGrob(p + coord_cartesian(xlim = c(4, 10), ylim = c(20, 30)) +
theme_bw(10)),
xmin = 21, xmax = 40, ymin = 30, ymax = 60)
xmin = 25, xmax = 40, ymin = 30, ymax = 60)
@

This approach has the limitation, shared with the use of \Rfunction{annotate()}, that if used together with faceting, the inset is added identically to all plot panels.
Expand Down Expand Up @@ -2810,7 +2810,7 @@ Function \Rfunction{annotate()} cannot be used with \code{geom = "vline"} or \co
\section{Coordinates and Circular Plots}\label{sec:plot:circular}\label{sec:plot:coord}
\index{grammar of graphics!polar coordinates|(}
\index{plots!circular|(}
The grammar of graphics, as implemented in \ggplot, allows many different combinations of its ``words'', and this is also how circular plots are created. To obtain circular plots, we use the same \emph{geometries}, \emph{statistics}, and \emph{scales} we have been using above, but combined with polar coordinates instead of the default cartesian coordinates. We override the default by adding \ggcoordinate{coord\_polar()} to the plot so that the \code{x} and \code{y} \textit{aesthetics} correspond to the angle and radial distance, respectively.
The grammar of graphics, as implemented in \ggplot, allows many different combinations of its ``words'', and this is also how circular plots are created. To obtain circular plots, we use the same \emph{geometries}, \emph{statistics}, and \emph{scales} we have been using above, but combined with polar coordinates instead of the default cartesian coordinates. We override the default by adding \ggcoordinate{coord\_polar()} to the plot so that the \code{x} and \code{y} \textit{aesthetics} correspond to the angle and radial distance, respectively.

Special systems of coordinates, such as \ggcoordinate{coord\_sf()}, used for maps, support different projections. In contrast, coordinate functions such as \ggcoordinate{coord\_flip()}, \ggcoordinate{coord\_trans()}, and \ggcoordinate{coord\_fixed()} offer variations based on the cartesian system.

Expand Down
Loading

0 comments on commit 361105e

Please sign in to comment.