[R] To many NA's from mean(..., na.rm=T) when a column is all NA's
Jim Robison-Cox
jimrc at math.montana.edu
Mon Jun 13 19:05:46 CEST 2005
Dear R-help folks,
I am seeing unexpected behaviour from the function mean
with option na.rm =TRUE (which is removing a whole column of a data frame
or matrix.
example:
testcase <- data.frame( x = 1:3, y = rep(NA,3))
mean(testcase[,1], na.rm=TRUE)
[1] 2
mean(testcase[,2], na.rm = TRUE)
[1] NaN
OK, so far that seems sensible. Now I'd like to compute both means at
once:
lapply(testcase, mean, na.rm=T) ## this works
$x
[1] 2
$y
[1] NaN
But I thought that this would also work:
apply(testcase, 2, mean, na.rm=T)
x y
NA NA
Warning messages:
1: argument is not numeric or logical: returning NA in:
mean.default(newX[, i], ...)
2: argument is not numeric or logical: returning NA in:
mean.default(newX[, i], ...)
Summary:
If I have a data frame or a matrix where one entire column is NA's,
mean(x, na.rm=T) works on that column, returning NaN, but fails using
apply, in that apply returns NA for ALL columns.
lapply works fine on the data frame.
If you wonder why I'm building data frames with columns that could be
all missing -- they arise as output of a simulation. The fact that the
entire column is missing is informative in itself.
I do wonder if this is a bug.
Thanks,
Jim
Jim Robison-Cox ____________
Department of Math Sciences | | phone: (406)994-5340
2-214 Wilson Hall \ BZN, MT | FAX: (406)994-1789
Montana State University | *_______|
Bozeman, MT 59717-2400 \_| e-mail: jimrc at math.montana.edu
More information about the R-help
mailing list