[Rd] Suggested change in merge.data.frame

David Kane <David Kane a296180@mica.fmr.com
Mon, 20 May 2002 13:38:06 -0400


AFAIK, this is not a bug since it could be a feature that merge.data.frame
coerces character vectors to be factor in certain circumstabces. (I think that
these circumstances require that all.x is TRUE, that y has fewer rows than x
and that the coerced column comes from y.)

In any event, changing the line 

for (i in seq(along = y)) is.na(y[[i]]) <- (lxy + 1):(lxy + nxx)

to 

for (i in seq(along = y)) y[((lxy + 1):(lxy + nxx)), i] <- NA

solves the (or, rather, my) problem. The main reason that I hesitate to
recommend it heartily is that I thought that assigning NA directly was A Bad
Thing, but perhaps this is not so.

If this is a bad idea, I would love to understand why. It seems to work fine
with the help page examples and in my own code. Here is a simple example:

> x <- data.frame(a = 1:4)
> y <- data.frame(b = LETTERS[1:3])
> y$b <- as.character(y$b)
> sapply(merge(x, y, by = 0, all.x = TRUE), data.class)
Row.names         a         b 
 "factor" "integer"  "factor" 
>

## Load up new merge.data.frame

> sapply(merge(x, y, by = 0, all.x = TRUE), data.class)
  Row.names           a           b 
   "factor"   "integer" "character" 
> merge(x, y, by = 0, all.x = TRUE)
  Row.names a    b
1         1 1    A
2         2 2    B
3         3 3    C
4         4 4   <NA>
> 

Thanks to Brian Ripley for much explanation (which I may not have fully
understood) of the issues involved.

Regards,

Dave Kane
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._