Skip to content

Commit

Permalink
Did some minor corrections suggested by Titta.
Browse files Browse the repository at this point in the history
  • Loading branch information
aphalo committed Aug 15, 2017
1 parent 898e145 commit 91e9ffa
Show file tree
Hide file tree
Showing 14 changed files with 325,443 additions and 163,755 deletions.
2 changes: 1 addition & 1 deletion R.as.calculator.Rnw
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ opts_knit$set(concordance=TRUE)

In my experience, for those not familiar with computing programming or scripting languages, and who have mostly used computer programs through visual interfaces making heavy use of menus and icons, the best first step in learning \Rlang is to learn the basics of the language through its use at the R command prompt. This will teach not only the syntax and grammar rules, but also give a glimpse at the advantages and flexibility of this approach to data analysis.

Menu-driven programs are not necessarily bad, they are just unsuitable when there is a need to set very many options and chose from many different actions. They are also difficult to maintain when extensibility is desired, and when independently developed modules of very different characteristics need to be integrated. Textual languages also have the advantage, to be dealt with in the next chapter, that command sequences can be stored in human- and computer readable text files. Such files constitute a record of all the steps used and in most cases makes it trivial to reproduce the same steps at a later time. Scripts are also a very simple and handy way of communicating to others how to do a given data analysis.
Menu-driven programs are not necessarily bad, they are just unsuitable when there is a need to set very many options and choose from many different actions. They are also difficult to maintain when extensibility is desired, and when independently developed modules of very different characteristics need to be integrated. Textual languages also have the advantage, to be dealt with in the next chapter, that command sequences can be stored in human- and computer readable text files. Such files constitute a record of all the steps used and in most cases makes it trivial to reproduce the same steps at a later time. Scripts are also a very simple and handy way of communicating to others how to do a given data analysis.

\section{Working at the R console}

Expand Down
4 changes: 2 additions & 2 deletions R.functions.Rnw
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,11 @@ opts_knit$set(concordance=TRUE)

\chapter{R built-in functions}\label{chap:R:functions}

\dictum[Howard Aiken, \emph{Proposed automatic calculating machine}, presented to IBM in 1937]{The desire to economize time and mental effort in arithmetical computations, and to eliminate human liability to error, is probably as old as the science of arithmetic itself.}\vskip2ex
\dictum[Alfred V. Aho, Jeffrey D. Ullman, \emph{Foundations of Computer Science}, Computer Science Press, 1992]{Computer Science is a science of abstraction---creating the right model for a problem and devising the appropriate mechanizable techniques to solve it.}\vskip2ex

\section{Aims of this chapter}

The aim of this chapter is to introduce some of the frequently used functions available in \pgrmname{R} including a sample of those used for statistical tests and model fitting. The \pgrmname{R} distribution includes both built-in functionality plus a set of recommended packages which one can count on always being available.
The aim of this chapter is to introduce some of the frequently used functions available in \pgrmname{R} including a sample of those used for statistical tests and model fitting. The \pgrmname{R} distribution includes both built-in functionality plus a set of recommended packages which one can count on always being available.

This chapter provides by necessity a very incomplete introduction to the capabilities of base R. This chapter is designed to give the reader only a quick introduction to base \R, as there are many good texts on the capabilities of \Rpgrm, going from the brief and concise books \citetitle{Beckerman2012} \autocite{Beckerman2012} and \citetitle{Allerhand2011} \autocite{Allerhand2011} at one extreme to the bulky and comprehensive \citetitle{Crawley2012} \autocite{Crawley2012} at the other. Books most useful as companions to the present book will be those somewhere in-between these two extremes. Three good examples with broad scope are \citetitle{Dalgaard2008} \autocite{Dalgaard2008}, \citetitle{Everitt2009} \autocite{Everitt2009} and \citetitle{??} \autocite{??}. Furthermore, many of base \R's functions are specific to different statistical procedures, maths and calculus, that transcend the description of \langname{R} as a programming language. The use of \pgrmname{R} for the analysis of different kinds of data and using different methods is covered by a vast bibliography, to which we provide some pointers in chapter \ref{chap:R:readings} on page \pageref{chap:R:readings}.

Expand Down
6 changes: 3 additions & 3 deletions R.more.plotting.Rnw
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ In contrast with previous chapters, I expect readers to first browse through the
In this chapter we use mostly the modernized data frames of package \pkgname{tibble}. The main reason is that the \Rfunction{tibble()} constructor does not by default convert character variables into factors as the \Rfunction{data.frame()} constructor does. The format used for printing is also improved. It is possible to use \Rfunction{data.frame()} instead of \Rfunction{tibble()} in most examples given in this chapter, but in some cases you will need to add \code{stringsAsFactors = FALSE} to the call.
\end{warningbox}

As the previous chapter, the present one focuses mainly on how to construct different types of graphical data displays using the grammar of graphics. We also discuss how to alter de ``graphical design'' of the plots produced, but in less depth, mostly leaving for the reader to try by herself/himself the different combinations of types of plots and themes and color palettes described. There is no book covering the use of all the packages described here, and for each package additional examples and explanations will be found in their documentation, which in many cases includes vignettes with extended use examples.
As the previous chapter, the present one focuses mainly on how to construct different types of graphical data displays using the grammar of graphics. We also discuss how to alter the ``graphical design'' of the plots produced, but in less depth, mostly leaving for the reader to try by herself/himself the different combinations of types of plots and themes and color palettes described. There is no book covering the use of all the packages described here, and for each package additional examples and explanations will be found in their documentation, which in many cases includes vignettes with extended use examples.

\section{Packages used in this chapter}

Expand Down Expand Up @@ -617,7 +617,7 @@ my.data <-
data.frame(y = c(rnorm(n = 100, mean = -1, sd = 1),
rnorm(n = 50, mean = 1, sd = 1),
rnorm(n = 50, mean = 1, sd = 0.3)),
group = factor(x = rep(c("A", "B", "c"), times = c(100, 50, 50))) )
group = factor(x = rep(c("A", "B", "C"), times = c(100, 50, 50))) )
@

Method \code{"quasirandom"}, used by default.
Expand Down Expand Up @@ -1178,7 +1178,7 @@ ggplot(AirPassengers) +
\index{plots!adding tables}
\index{plots!geometries!table@table}

The \gggeom{geom\_table()} plots a data frame or tibble, nested in a tibble pased as data
The \gggeom{geom\_table()} plots a data frame or tibble, nested in a tibble passed as data
argument. The \emph{aesthetics} \code{x} and \code{y} are used for positioning,
and the \code{label} aesthetic for the data frame containing the table's content.
The table is created as a 'grid' \code{grob} and added as usual to the \code{ggplot} object. Justification, as in a text label, controls where within the table the $x$ and $y$
Expand Down
6 changes: 3 additions & 3 deletions R.plotting.Rnw
Original file line number Diff line number Diff line change
Expand Up @@ -29,11 +29,11 @@ play.eval <- FALSE

\section{Aims of this chapter}

Three main plotting systems are available to \R users: base R, package \pkgname{lattice} \autocite{Sarkar2008} and package \pkgname{ggplolt2} \autocite{Wickham2016}, being the last one the most recent and currently most popular system available in \R for plotting data. Even two different sets of graphics primitives are available in R, that in base R and a newer one in the \pkgname{grid} package \autocite{Murrell2011}.
Three main plotting systems are available to \R users: base R, package \pkgname{lattice} \autocite{Sarkar2008} and package \pkgname{ggplot2} \autocite{Wickham2016}, being the last one the most recent and currently most popular system available in \R for plotting data. Even two different sets of graphics primitives are available in R, that in base R and a newer one in the \pkgname{grid} package \autocite{Murrell2011}.

In this chapter you will learn the concepts of the grammar of graphics, on which package \pkgname{ggplot2} is based. You will as well learn how to do many of the data plots that can be produced with package \pkgname{ggplot2}. We will focus only on the grammar of graphics, as it is currently the most used plotting approach in R. As a consequence of this popularity and its flexibility, many extensions to \pkgname{ggplot2} have been developed and deposited in public repositories. Several of these packages will be described in Chapter \ref{chap:R:more:ggplotting} starting on page \pageref{chap:R:more:ggplotting} and in Chapter \ref{chap:R:maps} starting on page \pageref{chap:R:maps}. As previous chapters, this chapter is intended to be read in whole.

This chapter focuses mainly on how to construct different types of graphical data displays using the grammar of graphics. We also discuss how to alter de ``graphical design'' of the plots produced, but in less depth, mostly leaving for the reader to try by herself/himself the different combinations of types of plots and themes and color palettes described. The book \citetitle{Burchell2016} \autocite{Burchell2016} has a strong focus on the control of how plots look, and can be a good source of worked out examples. For a cook book with a broader scope and detailed explanations consult \citetitle{Chang2013} \autocite{Chang2013}. The contents of the current chapter to some extent overlap with that of Chang's book, but using a different approach for presentation. Deeper explanations of technical aspects are available in the book \citetitle{Murrell2011} \autocite{Murrell2011}. Finally, the book \citetitle{Wickham2016} \autocite{Wickham2016} written by the developers of package \pkgname{ggplot2} is the main reference, and describes the grammar of graphics in more detail than we have space here for. In particular, the hands-on approach followed here makes this chapter a good complement to \citetitle{Wickham2016}.
This chapter focuses mainly on how to construct different types of graphical data displays using the grammar of graphics. We also discuss how to alter the ``graphical design'' of the plots produced, but in less depth, mostly leaving for the reader to try by herself/himself the different combinations of types of plots and themes and color palettes described. The book \citetitle{Burchell2016} \autocite{Burchell2016} has a strong focus on the control of how plots look, and can be a good source of worked out examples. For a cook book with a broader scope and detailed explanations consult \citetitle{Chang2013} \autocite{Chang2013}. The contents of the current chapter to some extent overlap with that of Chang's book, but using a different approach for presentation. Deeper explanations of technical aspects are available in the book \citetitle{Murrell2011} \autocite{Murrell2011}. Finally, the book \citetitle{Wickham2016} \autocite{Wickham2016} written by the developers of package \pkgname{ggplot2} is the main reference, and describes the grammar of graphics in more detail than we have space here for. In particular, the hands-on approach followed here makes this chapter a good complement to \citetitle{Wickham2016}.

\section{Packages used in this chapter}

Expand Down Expand Up @@ -103,7 +103,7 @@ The most frequently used coordinate system\index{plots!coordinates} when plottin

How the plots look when displayed or printed can be altered by means of themes\index{plots!themes}. A plot can be saved without adding a theme and then printed or displayed using different themes. Also individual theme elements can be changed, and whole new themes defined. This adds a lot of flexibility and helps in the separation of the data representation aspects from those related to the graphical design.

As discussed above the grammar of graphics is based on aesthetics (\code{aes}) as for example color, geometric elements \code{geom\_\ldots} such as lines, and points, statistics \code{stat\_\ldots}, scales \code{scale\_\ldots}, labels \code{labs}, \code{coordinate} systems and themes \code{theme\_\ldots}. Plots are assembled from these elements. More than one \emph{geometry} and/or \emph{statistic} can be added to the same plot, resulting in \emph{layers} conceptually with the first one added located at the bottom of the ``pile''.
As discussed above the grammar of graphics is based on aesthetics (\code{aes}) as for example color, geometric elements \code{geom\_\ldots} such as lines, and points, statistics \code{stat\_\ldots}, scales \code{scale\_\ldots}, labels \code{labs}, \code{coordinate} systems and themes \code{theme\_\ldots}. Plots are assembled from these elements. More than one \emph{geometry} and/or \emph{statistic} can be added to the same plot, resulting in \emph{layers} conceptually with the first one added located at the bottom of the ``pile''.

Even if we do not explicitly add them all, default, in many cases ``identity'' versions are involved. The production of a rendered graphic with package \pkgname{ggplot2} can be represented as a flow of information:
\textsf{data $\to$ scale $\to$ statistic $\to$ aesthetic $\to$ geometry $\to$ coordinate $\to$ ggplot $\to$ theme $\to$ rendered graphic}
Expand Down
6 changes: 3 additions & 3 deletions R.scripts.Rnw
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,7 @@ Sourcing is very useful when the script is ready, however, while developing a sc
\section{How to write a script?}\label{sec:script:writing}
\index{scripts!writing}

As with any type of writing various approaches may be preferred by different users. In general, the approach used, or mix of approaches will also depend on how confident you are that the statements will work as expected---you already know the best approach vs.\ you are exploring different alternatives.
As with any type of writing various approaches may be preferred by different users. In general, the approach used, or mix of approaches will also depend on how confident you are that the statements will work as expected---you already know the best approach vs.\ you are exploring different alternatives.
\begin{description}
\item[If one is very familiar with similar problems] One would just create a new text file and write the whole thing in the editor, and then test it. This is rather unusual.
\item[If one if moderately familiar with the problem] One would write the script as above, but testing it, step by step as one is writing it. This is usually what I do.
Expand Down Expand Up @@ -152,7 +152,7 @@ If during testing, or during normal use, a wrong value is returned by a calculat
Diagnosing the source of bugs is in most cases like detective work. One uses hunches based on common sense and experience to try to locate the lines of code causing the problem. One follows different \emph{leads} until the case is solved. In most cases at the very bottom we rely on some sort of divide and conquer strategy. For example, we may check the value returned by intermediate calculations until we locate the earliest code statement producing a wrong value. Another common case is when some input values trigger a bug. In such cases it is frequently best to start by testing if different ``cases'' of input lead to errors/crashes or not. Boundary input values are usually the telltale ones: e.g.\ for numbers, zero, negative and positive values, very large values, very small values, missing values (\code{NA}), vectors of length zero (\code{numeric()}), etc.

\begin{warningbox}
\paragraph{Error messages} When debugging keep in mind that in some cases a single bug can lead to a whole cascade of error messages. Do also keep in mind that typing mistakes, originating when code is entered through the keyboard, can break havock in a script: usually there is little correspondence between the number of error messages and the seriousness of the bug triggering them. When several errors are triggered, start by reading the error message printed first, as later errors can be an indirect consequence of earlier ones.
\textbf{Error messages} When debugging keep in mind that in some cases a single bug can lead to a whole cascade of error messages. Do also keep in mind that typing mistakes, originating when code is entered through the keyboard, can break havock in a script: usually there is little correspondence between the number of error messages and the seriousness of the bug triggering them. When several errors are triggered, start by reading the error message printed first, as later errors can be an indirect consequence of earlier ones.
\end{warningbox}

There are special tools, called debuggers, available, and they help enormously. Debuggers allow one to step through the code, executing one statement at a time and at each pause allowing the user to inspect the objects present in the R environment and their values. It is even possible to execute additional statements, say to modify the value of a variable, while execution is paused. An R debugger is available within \RStudio and also through the R console.
Expand Down Expand Up @@ -521,7 +521,7 @@ However, there are cases were we need to repeatedly execute statements in a way

\subsection{Iteration}
\index{for@\code{for}}\index{iteration!for loop}
We give the name \emph{iteration} to the process of repetitive execution of a program statement (simple or compound)---e.g.\ computed by iteration. We use the same word, iteration, also to name each one of these repetitions of the execution of a statement--e.g.\ the second iteration.
We give the name \emph{iteration} to the process of repetitive execution of a program statement (simple or compound)---e.g.\ computed by iteration. We use the same word, iteration, also to name each one of these repetitions of the execution of a statement--e.g.\ the second iteration.

The section of computer code being executed multiple times, conforms a loop (a closed path). Most loops contain a condition that determines when execution will continue outside the loop. The most frequently used type of loop is a \code{for} loop. These loops work in R on lists or vectors of values to act upon.

Expand Down
Loading

0 comments on commit 91e9ffa

Please sign in to comment.