Skip to content

Commit

Permalink
Add section on object names as character strings. Testing with curren…
Browse files Browse the repository at this point in the history
…t versions of packages. Development version of 'ggplot2' from Github breaks a couple of examples from other packages.
  • Loading branch information
aphalo committed Jun 13, 2017
1 parent e8e7f8e commit 947d1e7
Show file tree
Hide file tree
Showing 19 changed files with 230,510 additions and 230,248 deletions.
4 changes: 2 additions & 2 deletions R.as.calculator.Rnw
Original file line number Diff line number Diff line change
Expand Up @@ -703,7 +703,7 @@ my.factor <- factor(my.vector, levels = c(1, 0), labels = c("treated", "control"
my.factor
@

It is always preferable to use meaningful labels for levels, although it is possible to use numbers.
It is always preferable to use meaningful labels for levels, although it is possible to use numbers.

Converting factors into numbers is not intuitive, even in the case a factor was created from a \code{numeric} vector.

Expand All @@ -725,7 +725,7 @@ Create a factor with levels labeled with words. Create another factor with the l

Factors are very important in \Rpgrm. In contrast to other statistical software in which the role of a variable is set when defining a model to be fitted or when setting up a test, in \Rpgrm models are specified exactly in the same way for ANOVA and regression analysis, as \emph{linear models}. What `decides' what type of model is fitted is whether the explanatory variable is a factor (giving ANOVA) or a numerical variable (giving regression). This makes a lot of sense, as in most cases, considering an explanatory variable as categorical or not, depends on the design of the experiment or survey, in other words, is a property of the data and the experiment or survey that gave origin to them, rather than of the data analysis.

The order of the levels in a \code{factor} does not affect simple calculations or the values plotted, but it does affect how the output is printed, the order of the levels in the scales of plots, and in some cases the contrasts in significance tests. The default ordering is alphabetical, and is established at the time a factor is created. Consequently, rather frequently the default ordering of levels is not the one needed. As shown above, parameter \code{levels} in the constructor makes it possible to set the order of the levels. It is also possible to change the ordering of an existing factor.
The order of the levels in a \code{factor} does not affect simple calculations or the values plotted, but it does affect how the output is printed, the order of the levels in the scales of plots, and in some cases the contrasts in significance tests. The default ordering is alphabetical, and is established at the time a factor is created. Consequently, rather frequently the default ordering of levels is not the one needed. As shown above, parameter \code{levels} in the constructor makes it possible to set the order of the levels. It is also possible to change the ordering of an existing factor.

\begin{explainbox}
\textbf{Reordering factor levels.}\index{factors!reorder levels} The simplest approach is to use \Rfunction{factor()} and its \code{levels} parameter. The only complication is that the names of the existing levels and those passed as argument need to match, and typing mistakes can cause bugs. To avoid the error-prone step, in all examples except the first, we use \Rfunction{levels()} to retrieve the names of the levels from the factor itself.
Expand Down
10 changes: 6 additions & 4 deletions R.data.Rnw
Original file line number Diff line number Diff line change
Expand Up @@ -153,19 +153,21 @@ We here use in examples paths and filenames valid in MS-Windows. We have tried t
\end{warningbox}

Functions \Rfunction{getwd()} and \Rfunction{setwd()} can be used to get the path to the current working directory and to set a directory as current, respectively.

<<filenames-05>>=
<<filenames-05,eval=FALSE>>=
# not run
getwd()
@

Function \Rfunction{setwd()} returns the path of the previous working directory, allowing us to portably set the working directory to the previous one. Both relative paths, as in the example, or absolute paths are accepted as arguments.
<<filenames-06>>=
<<filenames-06,eval=FALSE>>=
# not run
oldwd <- setwd("..")
getwd()
@

The returned value is always an absolute full path, so it remains valid even if the path to the working directory changes more than once before it being restored.
<<filenames-07>>=
<<filenames-07,eval=FALSE>>=
# not run
oldwd
setwd(oldwd)
getwd()
Expand Down
16 changes: 12 additions & 4 deletions R.friends.Rnw
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ Studying the book \citetitle{Wickham2014advanced} \autocite{Wickham2014advanced}

\section{Measuring and improving performance}

In this section we present simple ``toy'' examples of how execution speed of R code can be improved. These examples demonstrate the use of benchmarking and profiling tools and of R's built-in compiler to improve performance of R code.
In this section we present simple ``toy'' examples of how execution speed of R code can be improved. These examples demonstrate the use of benchmarking and profiling tools and of R's built-in compiler to improve performance of R code.

\subsection{Benchmarking}

Expand Down Expand Up @@ -225,10 +225,14 @@ A round about but very effective way of improving performance with a runtime tha
microbenchmark(my.fun13(row = 100, col = 100), unit = "ms")
@

<<>>=
microbenchmark(my.fun13(row = 1000, col = 1000), times = 20L, unit = "ms")
@

We may still want to know where time is being spent in the optimized version of the function.

<<>>=
prof13.df <- profr(my.fun13(row = 100, col = 100), interval = 0.00003)
prof13.df <- profr(my.fun13(row = 1000, col = 1000), interval = 0.001)
ggplot(prof13.df)
@
Expand All @@ -248,6 +252,10 @@ Yes, indeed, using \code{as\_tibble()} instead of \code{as.data.frame()} halves
microbenchmark(my.fun14(row = 100, col = 100), unit = "ms")
@

<<>>=
microbenchmark(my.fun14(row = 1000, col = 1000), unit = "ms")
@

\begin{playground}
We have gone very far in optimizing the function. In this last version the function returns a \code{tibble} instead of a \code{data.frame}. This can be expected to affect the performance of different operations, from indexing and computations to summaries when applied to the returned object. Use bench marking to assess these differences, both for cases with substantially more columns than rows, and more rows than columns. Think carefully a test case that makes heavy use of indexing, calculations combining several columns, and sorting.
\end{playground}
Expand All @@ -261,7 +269,7 @@ We can see if a function is compiled, by printing it and looking if it contains

To test what speed-up compiling can achieve for this small function we switch-off default compiling momentarily with function \code{enaleJIT()}---JIT is an abbreviation for Just In Time compiler. Possible values of \code{levels} range from 0 to 3. Zero disables the compiler, while 1, 2 and 3 indicate use of the compiler by default in an increasing number of situations.

We define, using a different name, the same function as earlier, and we check that it has not been compiled. Then we compile it.
We define, using a different name, the same function as earlier, and we check that it has not been compiled. Then we compile it.

<<>>=
old.level <- compiler::enableJIT(level = 0L)
Expand Down Expand Up @@ -417,7 +425,7 @@ try(detach(package:ggplot2))
try(detach(package:tibble))
try(detach(package:profr))
try(detach(package:microbenchmark))
try(detach(package:rJava))
# try(detach(package:rJava))
# try(detach(package:rPython))
try(detach(package:inline))
try(detach(package:Rcpp))
Expand Down
12 changes: 6 additions & 6 deletions R.more.plotting.Rnw
Original file line number Diff line number Diff line change
Expand Up @@ -589,13 +589,13 @@ citation(package = "ggExtra")
set.seed(12345)
my.data <-
data.frame(x = rnorm(200),
y = c(rnorm(100, -1, 1), rnorm(100, 1, 1)),
y = c(rnorm(100, -1, 1), rnorm(100, 1, 1)),
group = factor(rep(c("A", "B"), c(100, 100))) )
@

<<ggextra-01>>=
p01 <- ggplot(my.data, aes(x, y)) +
geom_point()
geom_point()
@

<<ggextra-02>>=
Expand Down Expand Up @@ -675,6 +675,10 @@ Please, see section \ref{sec:plot:ggpmisc} for an alternative approach, slightly

\section[`ggnetwork']{\pkgname{ggnetwork}}\label{sec:plot:ggnetwork}

\begin{infobox}
This is not the only package supporting the plotting of network graphs with package \ggplot. Packages \pkgname{GGally} and \pkgname{geomnet} also support network graphs. Package \pkgname{ggCompNet} compares the three methods, both for performance and by giving examples of the visual design.
\end{infobox}

<<>>=
citation(package = "ggnetwork")
@
Expand Down Expand Up @@ -728,10 +732,6 @@ What happens if you change the order of the \code{geom}s in the code above? Expe
Change the graphic design of the plot in steps, by changing: 1) the shape of the nodes, 2) the color of the nodes, 3) the size of the nodes and the size of the text, 4) the type of arrows and their size, 5) the font used in nodes to italic.
\end{playground}

\begin{infobox}
This is not the only package supporting the plotting of network graphs with package \ggplot. Packages \pkgname{GGally} and \pkgname{geomnet} support network graphs. Package \pkgname{ggCompNet} compares the three methods, both for performance and by giving examples of the visual design.
\end{infobox}

\section[`geomnet']{\pkgname{geomnet}}\label{sec:plot:geomnet}

<<>>=
Expand Down
39 changes: 39 additions & 0 deletions R.scripts.Rnw
Original file line number Diff line number Diff line change
Expand Up @@ -715,6 +715,45 @@ How would you change this last example, so that only the last three columns are

There are many variants of \emph{apply} functions, both in base \langname{R} and exported by contributed packages. See section \ref{sec:data:apply} for details on the use of several of the later ones.

\section{Object names as character strings}

In all assignment examples before this section, we have given the object names to be assigned to, as part of expressions. Sometimes, in scripts or packages, we may want to provide the object name to be assigned to as a character string. This requires the use of function \Rfunction{assign()} instead of the operator \code{<-}. The statements bellow demonstrate this.

<<assignx-01>>=
assign("a", 9.99)
a
assign("b", a)
b
@

The two toy examples above do not demonstrate why one may want to use \code{assign()}. In scripts and package code there are a few typical cases where we may want to use character strings to store (future or existing) object names: 1) we may want to allow the user to provide names either interactively or as data as character objects, 2) in an iterative loop we may want to transverse a vector or list of object names, or 3) we may want to construct object names at runtime based on data or settings.

<<assignx-02>>=
for (i in 1:5) {
assign(paste("zz_", i, sep = ""), i^2)
}
ls(pattern = "zz_*")
@

The complementary operation is to \emph{get} an object when we have available its name as a character string. We use function \Rfunction{get()}.

<<assignx-03>>=
a <- 555
get("a")
@

If we have available a character vector containing object names and we want to create a list containing these objects we can use function \Rfunction{mget()}. In the example below we use function \code{ls()} to obtain a character vector of object names matching a specific pattern.

<<assignx-04>>=
obj_names <- ls(pattern = "zz_*")
obj_lst <- mget(obj_names)
str(obj_lst)
@

\begin{playground}
Think of possible uses of functions \code{assign()}, \code{get()} and \code{mget()} in scripts you use or could use to analyze your own data (or from other sources). Write a script to implement this, and iteratively test and revised this script until the result produced by the script matches your expectations.
\end{playground}

\section{Packages}\label{sec:script:packages}
\index{packages!using}
In \langname{R} speak `library' is the location where `packages' are installed. Packages are sets of functions, and data, specific for some particular purpose, that can be loaded into an \langname{R} session to make them available so that they can be used in the same way as built-in \langname{R} functions and data. The function \code{library()} is used to load packages, already installed in the local \langname{R} library, into the current session, while the function \Rfunction{install.packages()} is used to install packages, either from a file, or directly from the internet into the library. When using RStudio it is easiest to use RStudio commands (which call \Rfunction{install.packages()} and \Rfunction{update.packages()}) to install and update packages.
Expand Down
35 changes: 19 additions & 16 deletions appendixes.prj
Original file line number Diff line number Diff line change
Expand Up @@ -4,52 +4,55 @@
1
1
using-r-main.Rnw
66
67
13
3
5

references.bib
BibTeX
1049586 0 104 7 193 1 0 0 820 242 0 1 41 160 -1 -1 0 0 23 0 0 23 1 0 1 193 0 -1 0
1049586 0 104 7 193 1 0 0 820 242 0 1 44 160 -1 -1 0 0 23 0 0 23 1 0 1 193 0 -1 0
using-r-main.Rnw
TeX:RNW:UTF-8
152055803 0 -1 21567 -1 21655 208 208 1244 731 0 1 41 256 -1 -1 0 0 198 -1 -1 198 2 0 21655 -1 1 5602 -1 0 -1 0
152055803 0 -1 21278 -1 21282 208 208 1244 731 0 1 204 80 -1 -1 0 0 198 -1 -1 198 2 0 21282 -1 1 5602 -1 0 -1 0
usingr.sty
TeX:STY
1060850 2 56 20 56 14 234 234 1270 724 0 0 129 208 -1 -1 0 0 25 0 0 25 1 0 14 56 0 0 0
1060850 2 56 20 56 14 234 234 1270 724 0 0 156 -303 -1 -1 0 0 25 0 0 25 1 0 14 56 0 0 0
R.data.Rnw
TeX:RNW
286273531 0 -1 46418 -1 46427 26 26 977 443 1 1 801 16624 -1 -1 0 0 31 -1 -1 31 2 0 46427 -1 1 73118 -1 0 -1 0
17838075 0 -1 10040 -1 10585 26 26 977 443 1 1 104 220 -1 -1 0 0 31 -1 -1 31 2 0 10585 -1 1 73182 -1 0 -1 0
R.more.plotting.Rnw
TeX:RNW
17838075 0 -1 41551 -1 54872 26 26 924 603 1 1 97 400 -1 -1 0 0 30 -1 -1 30 1 0 54872 -1 0 -1 0
17838075 0 -1 29825 -1 26849 26 26 924 603 1 1 104 -27363 -1 -1 0 0 30 -1 -1 30 1 0 26849 -1 0 -1 0
R.friends.Rnw
TeX:RNW
17838075 0 -1 12523 -1 12523 104 104 853 490 0 1 129 288 -1 -1 0 0 31 -1 -1 31 1 0 12523 -1 0 -1 0
286273531 0 -1 12173 -1 12173 104 104 853 490 0 1 504 420 -1 -1 0 0 31 -1 -1 31 1 0 12173 -1 0 -1 0
R.plotting.Rnw
TeX:RNW
17838075 2 -1 6842 -1 6836 130 130 1166 559 1 1 225 192 -1 -1 0 0 31 -1 -1 31 4 0 6836 -1 1 43325 -1 2 161802 -1 3 161802 -1 0 -1 0
17838075 2 -1 6842 -1 6836 130 130 1166 559 1 1 1114 -72143 -1 -1 0 0 31 -1 -1 31 4 0 6836 -1 1 43325 -1 2 161802 -1 3 161802 -1 0 -1 0
R.maps.Rnw
TeX:RNW
17838075 1 -1 9 -1 33 64 64 974 522 1 1 353 0 -1 -1 0 0 57 -1 -1 57 1 0 33 -1 0 -1 0
17838075 1 -1 9 -1 33 64 64 974 522 1 1 434 -16284 -1 -1 0 0 57 -1 -1 57 1 0 33 -1 0 -1 0
R.intro.Rnw
TeX:RNW
17838075 0 -1 31770 -1 34649 182 182 1218 705 1 1 473 420 -1 -1 0 0 47 -1 -1 47 1 0 34649 -1 0 -1 0
17838075 0 -1 31770 -1 34649 182 182 1218 705 1 1 374 420 -1 -1 0 0 47 -1 -1 47 1 0 34649 -1 0 -1 0
rbooks.bib
BibTeX:UNIX
1147890 0 161 29 162 13 52 52 872 313 1 1 185 320 -1 -1 0 0 21 0 0 21 1 0 13 162 0 -1 0
1147890 0 161 29 162 13 52 52 872 313 1 1 224 320 -1 -1 0 0 21 0 0 21 1 0 13 162 0 -1 0
R.as.calculator.Rnw
TeX:RNW
1060859 0 886 126 -1 37182 26 26 1062 549 1 1 89 384 -1 -1 0 0 31 -1 -1 31 1 0 37182 -1 0 -1 0
1060859 0 -1 5804 -1 5802 26 26 1062 549 1 1 104 2260 -1 -1 0 0 31 -1 -1 31 1 0 5802 -1 0 -1 0
R.scripts.Rnw
TeX:RNW
17838075 0 -1 24645 -1 32975 78 78 1114 601 1 1 153 384 -1 -1 0 0 31 -1 -1 31 1 0 32975 -1 0 -1 0
17838075 0 -1 41800 -1 41800 78 78 1114 601 1 1 104 300 -1 -1 0 0 31 -1 -1 31 1 0 41800 -1 0 -1 0
R.functions.Rnw
TeX:RNW
17838075 4 -1 11545 -1 11546 130 130 1166 653 0 1 177 400 -1 -1 0 0 190 -1 -1 190 1 0 11546 -1 0 -1 0
17838075 4 -1 11545 -1 11546 130 130 1166 653 0 1 1384 400 -1 -1 0 0 190 -1 -1 190 1 0 11546 -1 0 -1 0
using-r-main.tex
TeX
269496315 7 -1 37139 -1 37156 96 96 1082 496 0 1 473 208 -1 -1 0 0 73 -1 -1 73 1 0 37156 -1 0 -1 0
269496315 7 -1 37139 -1 37159 96 96 1082 496 0 1 684 480 -1 -1 0 0 73 -1 -1 73 1 0 37159 -1 0 -1 0
hormiguero-ddns-error.txt
ASCII
273688443 0 0 1 0 1 32 32 1181 410 1 0 86 0 -1 -1 0 0 -1 -1 -1 -1 1 0 1 0 0 0 0
:\aphalo\Documents\RPackages\learnr-pkg\inst\extdata\areatable.dat
DATA
273678578 0 0 1 0 1 96 96 1246 475 1 0 86 0 -1 -1 0 0 301 0 0 301 1 0 1 0 0 0 0
Expand Down
Loading

0 comments on commit 947d1e7

Please sign in to comment.