[Rd] Duplicate column names created by base::merge() when by.x has the same name as a column in y

Sat Feb 17 06:42:21 CET 2018

The attached patch.diff will make merge.data.frame() append the suffixes to
columns with common names between by.x and names(y).

Best,

Scott Ritchie

On 17 February 2018 at 11:15, Scott Ritchie <s.ritchie73 at gmail.com> wrote:

> Hi Frederick,
>
> I would expect that any duplicate names in the resulting data.frame would
> have the suffixes appended to them, regardless of whether or not they are
> used as the join key. So in my example I would expect "names.x" and
> "names.y" to indicate their source data.frame.
>
> While careful reading of the documentation reveals this is not the case, I
> would argue the intent of the suffixes functionality should equally be
> applied to this type of case.
>
> If you agree this would be useful, I'm happy to write a patch for
> merge.data.frame that will add suffixes in this case - I intend to do the
> same for merge.data.table in the data.table package where I initially
> encountered the edge case.
>
> Best,
>
> Scott
>
> On 17 February 2018 at 03:53, <frederik at ofb.net> wrote:
>
>> Hi Scott,
>>
>> It seems like reasonable behavior to me. What result would you expect?
>> That the second "name" should be called "name.y"?
>>
>> The "merge" documentation says:
>>
>>     If the columns in the data frames not used in merging have any
>>     common names, these have ‘suffixes’ (‘".x"’ and ‘".y"’ by default)
>>     appended to try to make the names of the result unique.
>>
>> Since the first "name" column was used in merging, leaving both
>> without a suffix seems consistent with the documentation...
>>
>> Frederick
>>
>> On Fri, Feb 16, 2018 at 09:08:29AM +1100, Scott Ritchie wrote:
>> > Hi,
>> >
>> > I was unable to find a bug report for this with a cursory search, but
>> would
>> > like clarification if this is intended or unavoidable behaviour:
>> >
>> > ```{r}
>> > # Create example data.frames
>> > parents <- data.frame(name=c("Sarah", "Max", "Qin", "Lex"),
>> >                       sex=c("F", "M", "F", "M"),
>> >                       age=c(41, 43, 36, 51))
>> > children <- data.frame(parent=c("Sarah", "Max", "Qin"),
>> >                        name=c("Oliver", "Sebastian", "Kai-lee"),
>> >                        sex=c("M", "M", "F"),
>> >                        age=c(5,8,7))
>> >
>> > # Merge() creates a duplicated "name" column:
>> > merge(parents, children, by.x = "name", by.y = "parent")
>> > ```
>> >
>> > Output:
>> > ```
>> >    name sex.x age.x      name sex.y age.y
>> > 1   Max     M    43 Sebastian     M     8
>> > 2   Qin     F    36   Kai-lee     F     7
>> > 3 Sarah     F    41    Oliver     M     5
>> > Warning message:
>> > In merge.data.frame(parents, children, by.x = "name", by.y = "parent") :
>> >   column name ‘name’ is duplicated in the result
>> > ```
>> >
>> > Kind Regards,
>> >
>> > Scott Ritchie
>> >
>> >       [[alternative HTML version deleted]]
>> >
>> > ______________________________________________
>> > R-devel at r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-devel
>> >
>>
>
>

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: patch.diff
URL: <https://stat.ethz.ch/pipermail/r-devel/attachments/20180217/efabe387/attachment.ksh>