[R] Deleting rows and columns containing NA's and "" only

syrvn mentor_ at gmx.net
Mon Feb 13 16:48:11 CET 2012


Hello,

I use read.xls from the gdata package to read in xlsx files. Sometimes these
data.frames contain NA columns
and rows only. I know how to get rid of those ones but here is the R output
of a test data set read in with read.xls

> t1
     A          B         X         D               X.1         X.2
1 test      1         NA                        NA    
2 <NA>   asd    NA      asdasd    NA    
3                          NA      asdasd    NA    
4                          NA                        NA         NA

t1[1,2], t1[4,5] and t1[4,6] are NA in text form in the excel sheet. I don't
understand why in the first column it is <NA> while in the last two is not.
I basically want to get rid of column 5 and 6 and row 4 as they do not
contain any relevant information. If i do a is.na.data.frame(t1):

> is.na.data.frame(t1)
         A     B    X     D  X.1   X.2
[1,] FALSE FALSE TRUE FALSE TRUE FALSE
[2,]  TRUE FALSE TRUE FALSE TRUE FALSE
[3,] FALSE FALSE TRUE FALSE TRUE FALSE
[4,] FALSE FALSE TRUE FALSE TRUE FALSE

does not give me the result I hoped to get.

It seems that <NA> and NA are treated as NA but in t1[4,6] it is treated as
FALSE because if I do

> as.character(t1[4,6])
[1] "NA "

one can see that there is a whitespace after NA which is, however, not in
the excel sheet for sure.

I do not know how to deal with that...

Cheers

--
View this message in context: http://r.789695.n4.nabble.com/Deleting-rows-and-columns-containing-NA-s-and-only-tp4384173p4384173.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list