[Rd] Suggested change in merge.data.frame
David Kane <David Kane
a296180@mica.fmr.com
Mon, 20 May 2002 13:38:06 -0400
AFAIK, this is not a bug since it could be a feature that merge.data.frame
coerces character vectors to be factor in certain circumstabces. (I think that
these circumstances require that all.x is TRUE, that y has fewer rows than x
and that the coerced column comes from y.)
In any event, changing the line
for (i in seq(along = y)) is.na(y[[i]]) <- (lxy + 1):(lxy + nxx)
to
for (i in seq(along = y)) y[((lxy + 1):(lxy + nxx)), i] <- NA
solves the (or, rather, my) problem. The main reason that I hesitate to
recommend it heartily is that I thought that assigning NA directly was A Bad
Thing, but perhaps this is not so.
If this is a bad idea, I would love to understand why. It seems to work fine
with the help page examples and in my own code. Here is a simple example:
> x <- data.frame(a = 1:4)
> y <- data.frame(b = LETTERS[1:3])
> y$b <- as.character(y$b)
> sapply(merge(x, y, by = 0, all.x = TRUE), data.class)
Row.names a b
"factor" "integer" "factor"
>
## Load up new merge.data.frame
> sapply(merge(x, y, by = 0, all.x = TRUE), data.class)
Row.names a b
"factor" "integer" "character"
> merge(x, y, by = 0, all.x = TRUE)
Row.names a b
1 1 1 A
2 2 2 B
3 3 3 C
4 4 4 <NA>
>
Thanks to Brian Ripley for much explanation (which I may not have fully
understood) of the issues involved.
Regards,
Dave Kane
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._