(PR#1608) merge.data.frame can coerce character vectors to factor
in some circumstances (PR#1608)
> If the following two conditions are met:
> 1) all.x is TRUE
> 2) at least 1 row in y does not have a match in x
> then any character vectors in y will be coerced to be factors. Here is a simple
> example (previously provided on r-devel):
> > x <- data.frame(a = 1:4)
> > y <- data.frame(b = LETTERS[1:3])
> > y$b <- as.character(y$b)
> > z <- merge(x, y, by = 0, all.x = TRUE)
> > z
> Row.names a b
> 1 1 1 A
> 2 2 2 B
> 3 3 3 C
> 4 4 4 <NA>
> > sapply(z, data.class)
> Row.names a b
> "factor" "numeric" "factor"
> >
> This problem could be fixed by changing the line in merge.data.frame:
> for (i in seq(along = y)) is.na(y[[i]]) <- (lxy + 1):(lxy + nxx)
> to:
> for (i in seq(along = y)) y[((lxy + 1):(lxy + nxx)), i] <- NA
But other problems would be introduced, as the two operations are
not equivalent (and the right one has been used).
> To the extent that this is a feature rather than a bug (if so, I would like to
> know why),
I have already patiently explained it to you. It is a side issue of
subscripting of data frames converting character columns to factor.
I have also given you a workaround.
> then I would suggest that the following sentence be added to the
> documentation for merge at the end of the section on all.x
> "Be aware that, if all.x equals `TRUE', character vectors in `y' will be
> converted to factors if any rows in y have no matching row in `x'."
As I said before, this is a consequence of the general rules. Data frames
are not designed to have character columns, and those who insist on using
them must make themselves aware of the consequences.
