[R] rowSums()

Wed Sep 24 16:38:23 CEST 2008

on 09/24/2008 09:06 AM Doran, Harold wrote:
> Say I have the following data:
> 
> testDat <- data.frame(A = c(1,NA,3), B = c(NA, NA, 3))
> 
>> testDat
>    A  B
> 1  1 NA
> 2 NA NA
> 3  3  3
> 
> rowsums() with na.rm=TRUE generates the following, which is not desired:
> 
>> rowSums(testDat[, c('A', 'B')], na.rm=T)
> [1] 1 0 6
> 
> rowsums() with na.rm=F generates the following, which is also not
> desired:
> 
> 
>> rowSums(testDat[, c('A', 'B')], na.rm=F)
> [1] NA NA  6
> 
> I see why this occurs, but what I hope to have returned would be:
> [1] 1 NA  6
> 
> To get what I want I could do the following, but normally my ideas are
> bad ideas and there are codified and proper ways to do things. 
> 
> rr <- numeric(nrow(testDat))
> for(i in 1:nrow(testDat)) rr[i] <- if(all(is.na(testDat[i,]))) NA else
> sum(testDat[i,], na.rm=T)
> 
>> rr
> [1]  1 NA  6
> 
> Is there a "proper" way to do this? In my real data, nrow is over
> 100,000
> 
> Thanks,
> Harold

The behavior you observe is documented in ?rowSums in the Value section:

If there are no values in a range to be summed over (after removing
missing values with na.rm = TRUE), that component of the output is set
to 0 (*Sums) or NA (*Means), consistent with sum and mean.

So:

> sum(c(NA, NA), na.rm = TRUE)
[1] 0

As per the definition of the sum of an empty set being 0, which I got
burned on myself a while back.

You could feasibly use:

  Res <- rowSums(testDat, na.rm = TRUE)
  is.na(Res) <- rowSums(is.na(testDat)) == ncol(testDat)

HTH,

Marc Schwartz