Skip to content

Latest commit

 

History

History
74 lines (50 loc) · 2.9 KB

non-standard-eval.md

File metadata and controls

74 lines (50 loc) · 2.9 KB

Non-standard-evaluation

NOW OUTDATED!

The main goal of plyrmr if providing big-data close equivalents of well known and useful data frame manipulations, such as transform and subset. So why try to reinvent the wheel with bind.cols, transmute and where. The main reason is that those functions work best interactively, at the prompt, but they have some problems when used in other functions or packages. These limitations are inherited from the base package functions, not peculiar to their plyrmr brethren. plyrmr makes an attempt to provide two functions that match the convenience of select and where without their pitfalls. While we were at it, we also tried to make them more general and give them a cleaner but still familiar (SQL-inspired) interface. Let me introduce select and where. These are plyrmr functions with methods for data frames and Hadoop data sets and they are appropriate for interactive and programming use. The previous examples become, using these functions:

where(
	bind.cols(
		mtcars, 
		carb.per.cyl = carb/cyl), 
	carb.per.cyl >= 1)
               mpg cyl disp  hp vs gear carb carb.per.cyl
Ferrari Dino  19.7   6  145 175  0    5    6            1
Maserati Bora 15.0   8  301 335  0    5    8            1

and:

where(
	bind.cols(
		input("/tmp/mtcars"),
		carb.per.cyl = carb/cyl),
	carb.per.cyl >= 1)
               mpg cyl disp  hp vs gear carb carb.per.cyl
Ferrari Dino  19.7   6  145 175  0    5    6            1
Maserati Bora 15.0   8  301 335  0    5    8            1

Similar, but they work everywhere. For instance, if where is called within some function, which is in its turn used in some other function, we can have the following situation:

subset.mtcars = function(...) subset(mtcars, ...)
high.carb.cyl = function(x) {subset.mtcars(carb/cyl >= x) }
high.carb.cyl.1(1) 
Error: could not find function "high.carb.cyl.1"

Unfortunately, it doesn't work. With where instead:

where.mtcars = 
	function(...) where(mtcars, ...)
high.carb.cyl = function(x) {where.mtcars(carb/cyl >= x) }
high.carb.cyl(1)
               mpg cyl disp  hp vs gear carb
Ferrari Dino  19.7   6  145 175  0    5    6
Maserati Bora 15.0   8  301 335  0    5    8

The R documentation recommends to use [] only when programming, but having to rewrite code in a different context, to a computer scientist, is just an admission of defeat and negates one of the reasons R is so successful (many thanks to Hadley Wickham for valuable discussions on this subject, see also this github issue). Similar dplyr functions used to have the same problem as base function, but in the latest version they have fixed the evaluation behavior (so well in fact that we have switched to using their approach). For the relation between plyrmr and dplyr see Plyrmr vs. dplyr.