[R] a question on sqldf's handling of missing value and factor
xin wei
xinwei at stat.psu.edu
Wed Mar 2 16:14:44 CET 2011
Dear Mr. Grothendieck :
thank you so much for your attention. You are the real expert here. the
following is a mock text file:
a b c
aa 23
aaa 34
aaaa 77
note that both b and c column contain missing value (blank)
I save it under my C drive and use both read.table and sqldf to import it to
R and then use identical() function to compare the result. The following is
the result:
> setwd("c:/")
> library(sqldf)
> test <- file("test.txt")
> testx <- sqldf("select * from test",
+ dbname = tempfile(), file.format = list(header = T,
sep="\t", row.names = F))
> testy<- read.table("test.txt", header = T, sep="\t")
> identical(testx, testy)
[1] FALSE
> testx
a b c
1 aa 23.0
2 aaa 34.6 0.0
3 aaaa 77.8
> testy
a b c
1 aa NA 23.0
2 aaa 34.6 NA
3 aaaa NA 77.8
> class(testx$b)
[1] "factor"
> class(testy$b)
[1] "numeric"
>
read.table seems to get it right while sqldf treats b as factor (if I add
method="raw", b become character). what is more troubling is that column C
has number 0 at the second row while in the original file it is missing. In
my real world situation with a much larger text file, the problem is that
many cells are empty when they all actually have values in the original text
file.
I would greatly appreciate your help if you can shed some light on this.
thanks
--
View this message in context: http://r.789695.n4.nabble.com/a-question-on-sqldf-s-handling-of-missing-value-and-factor-tp3331007p3331662.html
Sent from the R help mailing list archive at Nabble.com.
More information about the R-help
mailing list