problems with missing values created by conversion using as.matri (PR#2130)

gregory_r_warnes@groton.pfizer.com gregory_r_warnes@groton.pfizer.com
Wed, 9 Oct 2002 18:54:03 +0200 (MET DST)


> version
         _                   
platform sparc-sun-solaris2.8
arch     sparc               
os       solaris2.8          
system   sparc, solaris2.8   
status                       
major    1                   
minor    6.0                 
year     2002                
month    10                  
day      01                  
language R                   

------------------------------

Create a very simple data frame containing an factor and a character vector
each containing a missing value:

	> x <- data.frame( a=c("",NA), b=c(1,NA) ) 

Conversion to a matrix treats the two missing values differently:

	> as.matrix(x)
	  a  b   
	1 "" " 1"
	2 NA "NA"

The missing value in the factor variable has been correctly converted to a
missing value, while the missing value in the numeric vector has been
incorrectly converted to a string "NA", which is not recognized as a missing
value:

	>  is.na(as.matrix(x))
	      a     b
	1 FALSE FALSE
	2  TRUE FALSE

This turned up because I was using lapply to check for rows containing only
blank or missing values:

	> all.blank <- function(x) all( is.na(x) | (x <= " ") )
	> blanks <- apply(x, 1, all.blank)
	> blanks
	    1     2 
	FALSE FALSE 

This should have yielded

	> blanks
	    1     2 
	FALSE TRUE


BTW direct conversion using as.character doesn't show any problems when
applied to the individual columns:

	> as.character(x$a)
	[1] "" NA
	> as.character(x$b)
	[1] "1" NA 

	
I think the problem is that as.matrix.data.frame is using format() to
convert things to characters, which is resulting in a "NA" string and not a
missing value.  

Why isn't it using as.character() for this?

For completeness here's the patch to make this change, but I have not
explored what other side effects this might have.

*** R-1.6.0/src/library/base/R/dataframe.R      Thu Aug 29 03:41:42 2002
--- R-1.6.0-GRW//src/library/base/R/dataframe.R Wed Oct  9 12:29:11 2002
***************
*** 931,937 ****
            if (is.character(X[[j]]))
                next
            xj <- X[[j]]
!           X[[j]] <- if(length(levels(xj))) as.vector(xj) else format(xj)
        }
      }
      X <- unlist(X, recursive = FALSE, use.names = FALSE)
--- 931,937 ----
            if (is.character(X[[j]]))
                next
            xj <- X[[j]]
!           X[[j]] <- if(length(levels(xj))) as.vector(xj) else
as.character(xj)
        }
      }
      X <- unlist(X, recursive = FALSE, use.names = FALSE)



-Greg


LEGAL NOTICE
Unless expressly stated otherwise, this message is confidential and may be privileged. It is intended for the addressee(s) only. Access to this E-mail by anyone else is unauthorized. If you are not an addressee, any disclosure or copying of the contents of this E-mail or any action taken (or not taken) in reliance on it is unauthorized and may be unlawful. If you are not an addressee, please inform the sender immediately.

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._