[R] x %>% y as an alternative to which( x > y)

Timothy Bates timothy.c.bates at gmail.com
Tue Sep 13 23:17:37 CEST 2011


Dear Duncan and Hadley,

I stumbled across the NA behavior of subset a little while ago and thought it might do the trick. But my common usage case is not getting a subsetting sans NAs, but setting values in the whole dataframe. 

So I need T/F at each row, not just the list of rows that match the subset of matching cases...

How would you do this with subset?

   data[data$YOB < 1908 & !is.na(data$YOB), "Age"]=NA

My %<% idea extends the vocabulary established by %in%, and works in the same grammatical situation.

here's a real example

# Fix missing T2 sex for same sex pairs...

twinData[twinData$Age %<% 12, "flynnEffect"] = FALSE # only set flynn F for people under 12, not inc NAs

Addressing Duncan's point about returning a logical array... the %<% function should be:

"%<%" <- function(table, x){
	lessThan = table < x
	lessThan[is.na(lessThan)] = FALSE
	return(lessThan)
} 

This also works for matrices as it should

> x = matrix(c(1:10,NA,12:20),nrow=2)
> x %<% 6
     [,1] [,2]  [,3]  [,4]  [,5]  [,6]  [,7]  [,8]  [,9] [,10]
[1,] TRUE TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[2,] TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE


On Sep 13, 2011, at 8:40 PM, Hadley Wickham wrote:

>> Because in coding, I often end up with big chunks looking like this:
>> 
>> ((mydataframeName$myvariableName > 2 & !is.na(mydataframeName$myvariableName)) & (mydataframeName$myotherVariableName == "male" & !is.na(mydataframeName$myotherVariableName)))
>> 
>> Which is much less readable/maintainable/editable than
>> 
>> mydataframeName$myvariableName > 2 & mydataframeName$myotherVariableName == "male"
> 
> Use subset:
> 
> subset(mydataframeName, myvariableName > 2 & myotherVariableName == "male")
> 
> (subset automatically treats NAs as false)
> 
> Hadley
> 
> -- 
> Assistant Professor / Dobelman Family Junior Chair
> Department of Statistics / Rice University
> http://had.co.nz/
> 



More information about the R-help mailing list