# [R] Vectorizing a loop

Jeff Newmiller jdnewmil at dcn.davis.ca.us
Tue Feb 7 23:44:44 CET 2012

On Tue, 7 Feb 2012, David Winsemius wrote:

>
> On Feb 7, 2012, at 12:56 PM, Jeff Newmiller wrote:
>
>> On Tue, 7 Feb 2012, Alexander Shenkin wrote:
>>
>>> Hello Folks,
>>>
>>> I'm trying to vectorize a loop that processes rows of a dataframe.  It
>>> involves lots of conditionals, such as "If column 10 == 3, and if column
>>> 3 is True, and both column 5 and 6 are False, then set column 4 to True".
>>>
>>> So, for example, any ideas about vectorizing the following?
>>>
>>> df = data.frame( list(a=c(1,2,3,4), b=c("a","b","c","d"), c=c(T,F,T,F),
>>> d=NA, e=c(F,F,T,T)) )
>>>
>>> for (i in 1:nrow(df)) {
>>>
>>>  if (df[i,3] %in% c(FALSE,NA) & (df[i,1] > 2 | df[i,5]) ) {
>>>      df[i,4] = 1
>>>  }
>>>
>>>  if (df[i,5] %in% c(TRUE, NA) & df[i,2] == "b") {
>>>      df[i,4] = 2
>>>      df[i,5] = T
>>>  }
>>>
>>> }
>>
>> Your code attempts to do some things with NA that won't behave the way
>> you expect them to. Specifically, you cannot use %in% to test for NA,
>
> Huh?
>
>> NA %in% NA
> [1] TRUE
>> NA %in% c(5, NA)
> [1] TRUE
>> NA %in% c(5, 6)
> [1] FALSE

Sorry, SQL rules bleeding through... %in% is clearly more forgiving in R
than IN is in SQL. However, the second if did check whether df[i,5] was
NA, yet the first if did not. Since comparisons with NA are neither false
nor true that test failed.

> NA | 1
[1] TRUE
> NA & 1
[1] NA
> NA > 1
[1] NA

>> # intermediate logical vectors for clarity
>> tmp <- ( is.na(df[[3]]) | !df[[3]] ) & ( df[[1]] > 2 | df[[5]] )
>> tmp2 <- ( is.na(df[[5]]) | df[[5]] ) & df[[2]] == "b"
>> df[ tmp, "d" ] <- 1
>> df[ tmp2, "d" ] <- 2
>> df[ tmp2, "e" ] <- TRUE

---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...