[R] x %>% y as an alternative to which( x > y)
Timothy Bates
timothy.c.bates at gmail.com
Tue Sep 13 23:17:37 CEST 2011
Dear Duncan and Hadley,
I stumbled across the NA behavior of subset a little while ago and thought it might do the trick. But my common usage case is not getting a subsetting sans NAs, but setting values in the whole dataframe.
So I need T/F at each row, not just the list of rows that match the subset of matching cases...
How would you do this with subset?
data[data$YOB < 1908 & !is.na(data$YOB), "Age"]=NA
My %<% idea extends the vocabulary established by %in%, and works in the same grammatical situation.
here's a real example
# Fix missing T2 sex for same sex pairs...
twinData[twinData$Age %<% 12, "flynnEffect"] = FALSE # only set flynn F for people under 12, not inc NAs
Addressing Duncan's point about returning a logical array... the %<% function should be:
"%<%" <- function(table, x){
lessThan = table < x
lessThan[is.na(lessThan)] = FALSE
return(lessThan)
}
This also works for matrices as it should
> x = matrix(c(1:10,NA,12:20),nrow=2)
> x %<% 6
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[2,] TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
On Sep 13, 2011, at 8:40 PM, Hadley Wickham wrote:
>> Because in coding, I often end up with big chunks looking like this:
>>
>> ((mydataframeName$myvariableName > 2 & !is.na(mydataframeName$myvariableName)) & (mydataframeName$myotherVariableName == "male" & !is.na(mydataframeName$myotherVariableName)))
>>
>> Which is much less readable/maintainable/editable than
>>
>> mydataframeName$myvariableName > 2 & mydataframeName$myotherVariableName == "male"
>
> Use subset:
>
> subset(mydataframeName, myvariableName > 2 & myotherVariableName == "male")
>
> (subset automatically treats NAs as false)
>
> Hadley
>
> --
> Assistant Professor / Dobelman Family Junior Chair
> Department of Statistics / Rice University
> http://had.co.nz/
>
More information about the R-help
mailing list