[Rd] Duplicate column names created by base::merge() when by.x has the same name as a column in y
Scott Ritchie
s.ritchie73 at gmail.com
Sat Feb 17 06:42:21 CET 2018
The attached patch.diff will make merge.data.frame() append the suffixes to
columns with common names between by.x and names(y).
Best,
Scott Ritchie
On 17 February 2018 at 11:15, Scott Ritchie <s.ritchie73 at gmail.com> wrote:
> Hi Frederick,
>
> I would expect that any duplicate names in the resulting data.frame would
> have the suffixes appended to them, regardless of whether or not they are
> used as the join key. So in my example I would expect "names.x" and
> "names.y" to indicate their source data.frame.
>
> While careful reading of the documentation reveals this is not the case, I
> would argue the intent of the suffixes functionality should equally be
> applied to this type of case.
>
> If you agree this would be useful, I'm happy to write a patch for
> merge.data.frame that will add suffixes in this case - I intend to do the
> same for merge.data.table in the data.table package where I initially
> encountered the edge case.
>
> Best,
>
> Scott
>
> On 17 February 2018 at 03:53, <frederik at ofb.net> wrote:
>
>> Hi Scott,
>>
>> It seems like reasonable behavior to me. What result would you expect?
>> That the second "name" should be called "name.y"?
>>
>> The "merge" documentation says:
>>
>> If the columns in the data frames not used in merging have any
>> common names, these have ‘suffixes’ (‘".x"’ and ‘".y"’ by default)
>> appended to try to make the names of the result unique.
>>
>> Since the first "name" column was used in merging, leaving both
>> without a suffix seems consistent with the documentation...
>>
>> Frederick
>>
>> On Fri, Feb 16, 2018 at 09:08:29AM +1100, Scott Ritchie wrote:
>> > Hi,
>> >
>> > I was unable to find a bug report for this with a cursory search, but
>> would
>> > like clarification if this is intended or unavoidable behaviour:
>> >
>> > ```{r}
>> > # Create example data.frames
>> > parents <- data.frame(name=c("Sarah", "Max", "Qin", "Lex"),
>> > sex=c("F", "M", "F", "M"),
>> > age=c(41, 43, 36, 51))
>> > children <- data.frame(parent=c("Sarah", "Max", "Qin"),
>> > name=c("Oliver", "Sebastian", "Kai-lee"),
>> > sex=c("M", "M", "F"),
>> > age=c(5,8,7))
>> >
>> > # Merge() creates a duplicated "name" column:
>> > merge(parents, children, by.x = "name", by.y = "parent")
>> > ```
>> >
>> > Output:
>> > ```
>> > name sex.x age.x name sex.y age.y
>> > 1 Max M 43 Sebastian M 8
>> > 2 Qin F 36 Kai-lee F 7
>> > 3 Sarah F 41 Oliver M 5
>> > Warning message:
>> > In merge.data.frame(parents, children, by.x = "name", by.y = "parent") :
>> > column name ‘name’ is duplicated in the result
>> > ```
>> >
>> > Kind Regards,
>> >
>> > Scott Ritchie
>> >
>> > [[alternative HTML version deleted]]
>> >
>> > ______________________________________________
>> > R-devel at r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-devel
>> >
>>
>
>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: patch.diff
URL: <https://stat.ethz.ch/pipermail/r-devel/attachments/20180217/efabe387/attachment.ksh>
More information about the R-devel
mailing list