problems with missing values created by conversion using as.matri (PR#2130)
gregory_r_warnes@groton.pfizer.com
gregory_r_warnes@groton.pfizer.com
Wed, 9 Oct 2002 18:54:03 +0200 (MET DST)
> version
_
platform sparc-sun-solaris2.8
arch sparc
os solaris2.8
system sparc, solaris2.8
status
major 1
minor 6.0
year 2002
month 10
day 01
language R
------------------------------
Create a very simple data frame containing an factor and a character vector
each containing a missing value:
> x <- data.frame( a=c("",NA), b=c(1,NA) )
Conversion to a matrix treats the two missing values differently:
> as.matrix(x)
a b
1 "" " 1"
2 NA "NA"
The missing value in the factor variable has been correctly converted to a
missing value, while the missing value in the numeric vector has been
incorrectly converted to a string "NA", which is not recognized as a missing
value:
> is.na(as.matrix(x))
a b
1 FALSE FALSE
2 TRUE FALSE
This turned up because I was using lapply to check for rows containing only
blank or missing values:
> all.blank <- function(x) all( is.na(x) | (x <= " ") )
> blanks <- apply(x, 1, all.blank)
> blanks
1 2
FALSE FALSE
This should have yielded
> blanks
1 2
FALSE TRUE
BTW direct conversion using as.character doesn't show any problems when
applied to the individual columns:
> as.character(x$a)
[1] "" NA
> as.character(x$b)
[1] "1" NA
I think the problem is that as.matrix.data.frame is using format() to
convert things to characters, which is resulting in a "NA" string and not a
missing value.
Why isn't it using as.character() for this?
For completeness here's the patch to make this change, but I have not
explored what other side effects this might have.
*** R-1.6.0/src/library/base/R/dataframe.R Thu Aug 29 03:41:42 2002
--- R-1.6.0-GRW//src/library/base/R/dataframe.R Wed Oct 9 12:29:11 2002
***************
*** 931,937 ****
if (is.character(X[[j]]))
next
xj <- X[[j]]
! X[[j]] <- if(length(levels(xj))) as.vector(xj) else format(xj)
}
}
X <- unlist(X, recursive = FALSE, use.names = FALSE)
--- 931,937 ----
if (is.character(X[[j]]))
next
xj <- X[[j]]
! X[[j]] <- if(length(levels(xj))) as.vector(xj) else
as.character(xj)
}
}
X <- unlist(X, recursive = FALSE, use.names = FALSE)
-Greg
LEGAL NOTICE
Unless expressly stated otherwise, this message is confidential and may be privileged. It is intended for the addressee(s) only. Access to this E-mail by anyone else is unauthorized. If you are not an addressee, any disclosure or copying of the contents of this E-mail or any action taken (or not taken) in reliance on it is unauthorized and may be unlawful. If you are not an addressee, please inform the sender immediately.
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._