(PR#1608) merge.data.frame can coerce character vectors to factor
in some circumstances (PR#1608)
Prof Brian D Ripley
ripley@stats.ox.ac.uk
Wed, 29 May 2002 12:34:32 +0100 (BST)
On Wed, 29 May 2002 a296180@agate.fmr.com wrote:
> If the following two conditions are met:
>
> 1) all.x is TRUE
>
> 2) at least 1 row in y does not have a match in x
>
> then any character vectors in y will be coerced to be factors. Here is a simple
> example (previously provided on r-devel):
>
> > x <- data.frame(a = 1:4)
> > y <- data.frame(b = LETTERS[1:3])
> > y$b <- as.character(y$b)
> > z <- merge(x, y, by = 0, all.x = TRUE)
> > z
> Row.names a b
> 1 1 1 A
> 2 2 2 B
> 3 3 3 C
> 4 4 4 <NA>
> > sapply(z, data.class)
> Row.names a b
> "factor" "numeric" "factor"
> >
>
> This problem could be fixed by changing the line in merge.data.frame:
>
> for (i in seq(along = y)) is.na(y[[i]]) <- (lxy + 1):(lxy + nxx)
>
> to:
>
> for (i in seq(along = y)) y[((lxy + 1):(lxy + nxx)), i] <- NA
But other problems would be introduced, as the two operations are
not equivalent (and the right one has been used).
> To the extent that this is a feature rather than a bug (if so, I would like to
> know why),
I have already patiently explained it to you. It is a side issue of
subscripting of data frames converting character columns to factor.
I have also given you a workaround.
> then I would suggest that the following sentence be added to the
> documentation for merge at the end of the section on all.x
>
> "Be aware that, if all.x equals `TRUE', character vectors in `y' will be
> converted to factors if any rows in y have no matching row in `x'."
As I said before, this is a consequence of the general rules. Data frames
are not designed to have character columns, and those who insist on using
them must make themselves aware of the consequences.
--
Brian D. Ripley, ripley@stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272860 (secr)
Oxford OX1 3TG, UK Fax: +44 1865 272595
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._