[R] To many NA's from mean(..., na.rm=T) when a column is all NA's
Sundar Dorai-Raj
sundar.dorai-raj at pdf.com
Mon Jun 13 19:19:27 CEST 2005
Jim Robison-Cox wrote:
> Dear R-help folks,
>
> I am seeing unexpected behaviour from the function mean
> with option na.rm =TRUE (which is removing a whole column of a data frame
> or matrix.
>
> example:
>
> testcase <- data.frame( x = 1:3, y = rep(NA,3))
>
> mean(testcase[,1], na.rm=TRUE)
> [1] 2
> mean(testcase[,2], na.rm = TRUE)
> [1] NaN
>
> OK, so far that seems sensible. Now I'd like to compute both means at
> once:
>
> lapply(testcase, mean, na.rm=T) ## this works
> $x
> [1] 2
>
> $y
> [1] NaN
>
> But I thought that this would also work:
>
> apply(testcase, 2, mean, na.rm=T)
> x y
> NA NA
> Warning messages:
> 1: argument is not numeric or logical: returning NA in:
> mean.default(newX[, i], ...)
> 2: argument is not numeric or logical: returning NA in:
> mean.default(newX[, i], ...)
>
> Summary:
> If I have a data frame or a matrix where one entire column is NA's,
> mean(x, na.rm=T) works on that column, returning NaN, but fails using
> apply, in that apply returns NA for ALL columns.
> lapply works fine on the data frame.
>
Did you try this with a "matrix" or just a data.frame?
> If you wonder why I'm building data frames with columns that could be
> all missing -- they arise as output of a simulation. The fact that the
> entire column is missing is informative in itself.
>
>
> I do wonder if this is a bug.
>
Your problem is not ?apply, but ?as.matrix, which apply calls. Hint: Try
as.matrix(testdata) and see what it returns.
If you need a matrix, why construct a data.frame? The following will
give you what you want:
x <- matrix(c(1:3, rep(NA, 3)), nc = 2)
apply(x, 2, mean, na.rm = TRUE)
or better yet,
colMeans(x, na.rm = TRUE)
Note, that colMeans may give NA instead of NaN for column 2. See
?colMeans for an explanation.
HTH,
--sundar
More information about the R-help
mailing list