[R] To many NA's from mean(..., na.rm=T) when a column is all NA's

Mon Jun 13 19:05:46 CEST 2005

Dear R-help folks,

I am seeing unexpected behaviour from the function mean
with option na.rm =TRUE (which is removing a whole column of a data frame
or matrix.

example:

testcase <- data.frame( x = 1:3, y = rep(NA,3))

mean(testcase[,1], na.rm=TRUE)
[1] 2
mean(testcase[,2], na.rm = TRUE)
[1] NaN

  OK, so far that seems sensible.  Now I'd like to compute both means at
once:

  lapply(testcase, mean, na.rm=T)   ## this works
$x
[1] 2

$y
[1] NaN

  But I thought that this would also work:

apply(testcase, 2, mean, na.rm=T)
 x  y
NA NA
Warning messages:
1: argument is not numeric or logical: returning NA in:
mean.default(newX[, i], ...)
2: argument is not numeric or logical: returning NA in:
mean.default(newX[, i], ...)

 Summary:
  If I have a data frame or a matrix where one entire column is NA's,
mean(x, na.rm=T) works on that column, returning NaN, but fails using
apply, in that apply returns NA for ALL columns.
  lapply works fine on the data frame.

  If you wonder why I'm building data frames with columns that could be
all missing -- they arise as output of a simulation.  The fact that the
entire column is missing is informative in itself.

  I do wonder if this is a bug.

Thanks,
Jim

Jim Robison-Cox               ____________
Department of Math Sciences  |            |       phone: (406)994-5340
2-214 Wilson Hall             \   BZN, MT |       FAX:   (406)994-1789
Montana State University       |  *_______|
Bozeman, MT 59717-2400          \_|      e-mail: jimrc at math.montana.edu