merge.data.frame can coerce character vectors to factor in some circumstances (PR#1608)

a296180@agate.fmr.com a296180@agate.fmr.com
Wed, 29 May 2002 13:10:19 +0200 (MET DST)


If the following two conditions are met:

1) all.x is TRUE

2) at least 1 row in y does not have a match in x

then any character vectors in y will be coerced to be factors. Here is a simple
example (previously provided on r-devel):

> x <- data.frame(a = 1:4)
> y <- data.frame(b = LETTERS[1:3]) 
> y$b <- as.character(y$b) 
> z <- merge(x, y, by = 0, all.x = TRUE)
> z
  Row.names a    b
1         1 1    A
2         2 2    B
3         3 3    C
4         4 4   <NA>
> sapply(z, data.class)
Row.names         a         b 
 "factor" "numeric"  "factor" 
> 

This problem could be fixed by changing the line in merge.data.frame:

for (i in seq(along = y)) is.na(y[[i]]) <- (lxy + 1):(lxy + nxx) 

to:

for (i in seq(along = y)) y[((lxy + 1):(lxy + nxx)), i] <- NA 

To the extent that this is a feature rather than a bug (if so, I would like to
know why), then I would suggest that the following sentence be added to the
documentation for merge at the end of the section on all.x

"Be aware that, if all.x equals `TRUE', character vectors in `y' will be
converted to factors if any rows in y have no matching row in `x'."

Thanks,

Dave Kane

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._