Alex Shum bio photo

Alex Shum

Data Scientist at Apple.

Twitter LinkedIn Github Stackoverflow

Non standard evaluation is one of the R language’s strange features; I think it’s stupid. Here’s some R code that evaluates as you would expect (in a sane way):

df = data.frame("hello world", 
                5, FALSE, 
                stringsAsFactors = FALSE)
names(df) = c("a", "b", "c")
f = function(x) x

In the above code I create a dataframe with three 3 columns: a string, numeric and a boolean. I also define function f(x) which simply returns x.

f(df$a) = hello world
f(df$b) = 5
f(df$c) = FALSE

When the function argument x is evaluated in function f(x) it will evaluate df$a, df$b and df$c and look in the dataframe and give the information in the correct column of the dataframe.

Let’s rewrite f(x) so that it works with non-standard evaluation (the insane way):

f = function(x) deparse(substitute(x))

f(x) now no longer evaluates df$a, df$b or df$c when they are passed in as arguments:

f(df$a) = df$a
f(df$a) = df$b
f(df$a) = df$c

Instead substitute(x) will capture the input as an expression without evaluating it and deparse(x) will turn this expression into a string.

Why is non-standard evaluation annoying? If the non-standard evaluation version of f(x) is some important function I need to call but my input is stored in a data frame or list (as in the above example) then I’m going to have a bad day.

library() is an example of a real R function that uses non-standard evaluation. library(randomForest) and library("randomForest") both evaluate correctly. But if you have string = "randomForst", then library(string) will try to load the “string” library. This will come into play if you use something like the shiny package where a lot of input from the UI comes from input$ui_object. See this issue with shiny and non-standard evaluation.

Non-standard evaluation allows you to capture expressions and do things like avoid using quotations when naming dataframe columns. In fact, when I defined the data.frame in the above code segment I could have defined it also using non-standard evaluation:

df = data.frame(a = "hello world", 
                b = 5, c = FALSE, 
                stringsAsFactors = FALSE)

Are quotations really that much of a hassle to add to your code? This is especially true for a language where the puritanical users prefer the <- assignment operator instead of =. Do the non-evaluation shortcuts makeup for all the hoops you need to jump through?

I’m sure there’s some clever hack to get around the non-standard evaluation in the above examples. But this is still an annoying complication introduced by an advanced “feature”.