[R] a merge() problem

Prof Brian Ripley ripley at stats.ox.ac.uk
Mon Oct 8 07:37:07 CEST 2012


On 08/10/2012 02:57, Peter Ehlers wrote:
> On 2012-10-07 14:44, Sam Steingold wrote:
>>> * Peter Ehlers <ruyref at hpnytnel.pn> [2012-10-07 10:03:42 -0700]:
>>>
>>> On 2012-10-07 08:34, Sam Steingold wrote:
>>>> I know it does not look very good - using the same column names to mean
>>>> different things in different data frames, but here you go:
>>>> --8<---------------cut here---------------start------------->8---
>>>>> x <- data.frame(a=c(1,2,3),b=c(4,5,6))
>>>>> y <- data.frame(b=c(1,2),a=c("a","b"))
>>>>> merge(x,y,by.x="a",by.y="b",all.x=TRUE,suffixes=c("","y"))
>>>>     a b    a
>>>> 1 1 4    a
>>>> 2 2 5    b
>>>> 3 3 6 <NA>
>>>> Warning message:
>>>> In merge.data.frame(x, y, by.x = "a", by.y = "b", all.x = TRUE) :
>>>>     column name 'a' is duplicated in the result
>>>> --8<---------------cut here---------------end--------------->8---
>>>> why is the suffixes argument ignored?
>>>> I mean, I expected that the second "a" to be "a.y".
>>>
>>> The 'suffixes' argument refers to _non-by_ names only (as per ?merge).
>>
>> yes, but "a" in "y" is _not_ a by-name.
>
> Yes, it is.
> The set of by-names is the union of names specified by by.x and by.y,
> in your case: c("a", "b").
> I suppose that a case could be made that ?merge does not spell that
> out sufficiently explicitly.

It does in 'Details' (and where else would there be such a detail?) 
E.g. in R 2.15.1:

      If the remaining columns in the data frames have any common names,
      these have ‘suffixes’ (‘".x"’ and ‘".y"’ by default) appended to
      try to make the names of the result unique.  If this is not
      possible, an error is thrown.

Note *remaining*, and read what comes before that.

>
> Peter Ehlers
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595




More information about the R-help mailing list