[R] x %>% y as an alternative to which( x > y)

Duncan Murdoch murdoch.duncan at gmail.com
Tue Sep 13 23:31:15 CEST 2011


On 11-09-13 5:17 PM, Timothy Bates wrote:
> Dear Duncan and Hadley,
>
> I stumbled across the NA behavior of subset a little while ago and thought it might do the trick. But my common usage case is not getting a subsetting sans NAs, but setting values in the whole dataframe.
>
> So I need T/F at each row, not just the list of rows that match the subset of matching cases...
>
> How would you do this with subset?
>
>     data[data$YOB<  1908&  !is.na(data$YOB), "Age"]=NA

Unlike Hadley, I didn't mean to use the subset() function, I was just 
talking about computing the subset first, and doing the rest later.  So 
you would write that as something like

complete <- !is.na(data$YOB)
data[complete & data$YOB < 1908, "Age"] <- NA

Of course, this isn't really necessary when you're only checking one 
variable, but completeness tests are often more complicated.

More below...
> My %<% idea extends the vocabulary established by %in%, and works in the same grammatical situation.
>
> here's a real example
>
> # Fix missing T2 sex for same sex pairs...
>
> twinData[twinData$Age %<% 12, "flynnEffect"] = FALSE # only set flynn F for people under 12, not inc NAs
>
> Addressing Duncan's point about returning a logical array... the %<% function should be:
>
> "%<%"<- function(table, x){
> 	lessThan = table<  x
> 	lessThan[is.na(lessThan)] = FALSE
> 	return(lessThan)
> }

I think that still doesn't work quite right.  You want the conversion of 
NA to FALSE to happen as the last part of evaluating an expression, not 
in intermediate steps.  Otherwise

!(a %<% 10)

will give TRUE for NA values, which may not be as intended, if your 
intention was to skip NA cases.

Duncan Murdoch

> This also works for matrices as it should
>
>> x = matrix(c(1:10,NA,12:20),nrow=2)
>> x %<% 6
>       [,1] [,2]  [,3]  [,4]  [,5]  [,6]  [,7]  [,8]  [,9] [,10]
> [1,] TRUE TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
> [2,] TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
>
>
> On Sep 13, 2011, at 8:40 PM, Hadley Wickham wrote:
>
>>> Because in coding, I often end up with big chunks looking like this:
>>>
>>> ((mydataframeName$myvariableName>  2&  !is.na(mydataframeName$myvariableName))&  (mydataframeName$myotherVariableName == "male"&  !is.na(mydataframeName$myotherVariableName)))
>>>
>>> Which is much less readable/maintainable/editable than
>>>
>>> mydataframeName$myvariableName>  2&  mydataframeName$myotherVariableName == "male"
>> Use subset:
>>
>> subset(mydataframeName, myvariableName>  2&  myotherVariableName == "male")
>>
>> (subset automatically treats NAs as false)
>>
>> Hadley
>>
>> -- 
>> Assistant Professor / Dobelman Family Junior Chair
>> Department of Statistics / Rice University
>> http://had.co.nz/
>>



More information about the R-help mailing list