Skip to content

Commit

Permalink
progress with Ch 5, starting Ch7
Browse files Browse the repository at this point in the history
  • Loading branch information
aphalo committed Sep 23, 2023
1 parent 206d230 commit 539181b
Show file tree
Hide file tree
Showing 32 changed files with 340,837 additions and 335,337 deletions.
38 changes: 24 additions & 14 deletions R.as.calculator.Rnw
Original file line number Diff line number Diff line change
Expand Up @@ -2142,11 +2142,8 @@ rm(list = setdiff(ls(pattern="*"), to.keep))
\section{Matrices and multidimensional arrays}\label{sec:matrix:array}
\index{matrices|(}\index{arrays|(}\qRclass{matrix}\qRclass{array}

Matrices have two dimensions, rows and columns, and like vectors all their members share the same mode, and are atomic, i.e., they are homogeneous. Most commonly, matrices are used to store \code{numeric}, \code{integer} or \code{logical} values. The number of rows and columns can differ, so matrices can be either square or rectangular in shape, but never ragged.

In \Rlang, the first index always denotes rows and the second index always denotes columns. The diagram below depicts a matrix, $A$, with $m$ rows and $n$ columns and size equal to $m \times n$ ``cells'', with individual values denoted by $a_{i,j}$. Here we use a simpler representation than that used for vectors on page \pageref{par:calc:vectors:diag} above, but the same concepts apply.

\begin{center}
\begin{figure}
\centering
\begin{footnotesize}
\begin{tikzpicture}[auto matrix/.style={matrix of nodes,
draw,thick,inner sep=0pt,
Expand Down Expand Up @@ -2196,15 +2193,19 @@ In \Rlang, the first index always denotes rows and the second index always denot
\draw[thick,-stealth] ([xshift=-2ex]matx.north west)
-- ([xshift=-2ex]matx.south west) node[midway,above,rotate=90] {Rows or margin 1: $i = 1$ to $i = m$};
\end{tikzpicture}
\end{footnotesize}
\end{center}\label{fig:matrix:margins}
\end{footnotesize}\vspace{-1ex}
\caption[Diagram of an \Rlang matrix.]{Diagram of an \Rlang matrix showing indexing of members.}\label{fig:matrix:margins}
\end{figure}

Matrices have two dimensions, rows and columns, and like vectors all their members share the same mode, and are atomic, i.e., they are homogeneous (Figure \ref{fig:matrix:margins}). Most commonly, matrices are used to store \code{numeric}, \code{integer} or \code{logical} values. The number of rows and columns can differ, so matrices can be either square or rectangular in shape, but never ragged.

In \Rlang, the first index always denotes rows and the second index always denotes columns. The diagram below depicts a matrix, $A$, with $m$ rows and $n$ columns and size equal to $m \times n$ ``cells'', with individual values denoted by $a_{i,j}$. Here we use a simpler representation than that used for vectors on page \pageref{par:calc:vectors:diag} above, but the same concepts apply.

\begin{warningbox}
In \Rlang documentation, the individual dimensions of matrices and arrays are frequently called \emph{margins}, numbered in the same order as the indices are given. Thus, in a matrix the first margin corresponds to rows and the second one to columns.
\end{warningbox}

In mathematical notation the same generic matrix is represented as

\begin{equation*}
A_{m\times n} =
\begin{bmatrix}
Expand All @@ -2216,7 +2217,6 @@ In mathematical notation the same generic matrix is represented as
a_{m,1} & a_{m,2} & \cdots & a_{m,j} & \cdots & a_{m,n}
\end{bmatrix}
\end{equation*}

where $A$ represents the whole matrix, $m \times n$ its dimensions, and $a_{i,j}$ its elements, with $i$ indexing rows and $j$ indexing columns. The lengths of the two dimensions of the matrix are given by $m$ and $n$, for rows and columns.

Vectors have a single dimension, and, as described on page \pageref{par:calc:vectors:diag} above, we can query this dimension, their length, with function \Rfunction{length()}. Matrices have two dimensions, which can be queried individually with \Rfunction{ncol()} and \Rfunction{nrow()}, and jointly with \Rfunction{dim()}. As expected \Rfunction{is.matrix()} can be used to query the class.
Expand All @@ -2230,6 +2230,14 @@ matrix(1:15, nrow = 3)

When a matrix is printed in \Rlang the row and column indexes are indicated on the left and top margins, in the same way as they would be used to extract whole rows and columns.

\begin{explainbox}
Matrices are most useful for storage of numeric values as matrix algebra plays an important role in statistical computations. This notwithstanding, it is possible to create matrices (and arrays) from atomic vectors of other classes such as \Rclass{logical} or \Rclass{character}. The only difference is the scarcity of meaningful operations other than retrieval of members using two indices.

<<matrix-character-01>>=
matrix(letters[1:15], nrow = 3)
@
\end{explainbox}

When a vector is converted to a matrix, \Rlang's default is to allocate the values in the vector to the matrix starting from the leftmost column, and within the column, down from the top. Once the first column is filled, the process continues from the top of the next column, as can be seen above. This order can be changed as you will discover in the playground below.

\begin{playground}
Expand Down Expand Up @@ -2358,10 +2366,9 @@ dim(no.elem.matrix)

\end{explainbox}

Arrays\index{matrix!dimensions}\index{arrays!dimensions} are similar to matrices, but can have one or more dimensions. The dimensions of an array can be queried with \Rfunction{dim()}, similarly as with matrices. Whether an \Rlang object is an array can be found out with \Rfunction{is.array()}. The diagram below depicts an array, $A$ with three dimensions giving a size equal to $l\times m \times n$, and individual values denoted by $a_{i,j,k}$.

\begin{figure}
\centering
%\usetikzlibrary{matrix}
\begin{center}
\newcounter{kmargincount}
\begin{footnotesize}
\begin{tikzpicture}[auto matrix/.style={matrix of nodes,
Expand Down Expand Up @@ -2439,8 +2446,11 @@ Arrays\index{matrix!dimensions}\index{arrays!dimensions} are similar to matrices
\draw[thick,-stealth] ([xshift=-2ex]matx.north west)
-- ([xshift=-2ex]matx.south west) node[midway,above,rotate=90] {Margin 1: $i = 1$ to $i = l$};
\end{tikzpicture}
\end{footnotesize}
\end{center}\label{fig:array:margins}
\end{footnotesize}\vspace{-1ex}
\caption[Diagram of an \Rlang array.]{Diagram of an \Rlang array with three dimensions showing indexing of members.}\label{fig:array:margins}
\end{figure}

Arrays\index{matrix!dimensions}\index{arrays!dimensions} are similar to matrices, but can have one or more dimensions (Figure \ref{fig:array:margins}). The dimensions of an array can be queried with \Rfunction{dim()}, similarly as with matrices. Whether an \Rlang object is an array can be found out with \Rfunction{is.array()}. The diagram below depicts an array, $A$ with three dimensions giving a size equal to $l\times m \times n$, and individual values denoted by $a_{i,j,k}$.

When calling the constructor \Rfunction{array()}, dimensions are specified with the argument passed to parameter \code{dim}.

Expand Down
2 changes: 1 addition & 1 deletion R.data.Rnw
Original file line number Diff line number Diff line change
Expand Up @@ -490,7 +490,7 @@ What is the difference between the values returned by the two functions? Could s
\end{advplayground}
\index{reshaping tibbles|)}

\section{Data manipulation with \pkgname{dplyr}}
\section{Data manipulation with \pkgname{dplyr}}\label{sec:dplyr:manip}
\index{data manipulation in the tidyverse|(}

The first advantage a user of the \pkgname{dplyr} functions and methods sees is the completeness of the set of operations supported and the symmetry and consistency among the different functions. A second advantage is that almost all the functions are defined not only for objects of class \Rclass{tibble}, but also for objects of class \code{data.table} (package \pkgname{dtplyr}) and for SQL databases (package \pkgname{dbplyr}), with consistent syntax (see also section \ref{sec:data:db} on page \pageref{sec:data:db}). A downside of \pkgname{dplyr} and much of the \pkgname{tidyverse} is that the syntax is not yet fully stable. Additionally, some function and method names either override those in base \Rlang or clash with names used in other packages. \Rlang itself is extremely stable and expected to remain forward and backward compatible for a long time. For code intended to remain in use for years, the fewer packages it depends on, the less maintenance it will need. When using the \pkgname{tidyverse} we need to be prepared to revise our own dependent code after any major revision to the \pkgname{tidyverse} packages we use.
Expand Down
2 changes: 1 addition & 1 deletion R.data.containers.Rnw
Original file line number Diff line number Diff line change
Expand Up @@ -887,7 +887,7 @@ df14$D <- df14$A / df14$B + 1
head(df14, 2)
@

Using\index{data frames!attaching} \Rscoping{attach()} we can alter where \Rlang looks up names and consequently simplify the statement. With \Rscoping{detach()} we can restore the original state. It is important to remember that here we can only simplify the right-hand side of the assignment, while the ``destination'' of the result of the computation still needs to be fully specified on the left-hand side of the assignment operator. We include below only one statement between \Rscoping{attach()} and \Rscoping{detach()} but multiple statements are allowed. Furthermore, if variables with the same name as the columns exist in the search path, these will take precedence, something that can result in bugs or crashes, or as seen below, a message warns that variable \code{A} from the global environment will be used instead of column \code{A} of the attached \code{df17}. The returned value is, of course, not the desired one.
Using\index{data frames!attaching}\label{par:calc:attach} \Rscoping{attach()} we can alter where \Rlang looks up names and consequently simplify the statement. With \Rscoping{detach()} we can restore the original state. It is important to remember that here we can only simplify the right-hand side of the assignment, while the ``destination'' of the result of the computation still needs to be fully specified on the left-hand side of the assignment operator. We include below only one statement between \Rscoping{attach()} and \Rscoping{detach()} but multiple statements are allowed. Furthermore, if variables with the same name as the columns exist in the search path, these will take precedence, something that can result in bugs or crashes, or as seen below, a message warns that variable \code{A} from the global environment will be used instead of column \code{A} of the attached \code{df17}. The returned value is, of course, not the desired one.

<<data-frames-EB-11a>>=
df17 <- data.frame(A = 1:10, B = 3)
Expand Down
Loading

0 comments on commit 539181b

Please sign in to comment.