[Rd] merge bug fix in R 2.15.0

Sun Mar 18 21:50:30 CET 2012

Hi,

I'm not sure I follow ... I think we're in total agreement, but it
sounds like you're suggesting we aren't.

On Sun, Mar 18, 2012 at 4:40 PM, Peter Meilstrup
<peter.meilstrup at gmail.com> wrote:
> On Sun, Mar 18, 2012 at 12:48 PM, Steve Lianoglou
> <mailinglist.honeypot at gmail.com> wrote:
[snip]

>> > Right, the user is now protected against confusing himself by using
>> > names
>> > that were not unique before the merge.
>>
>> ... now I'm confused :-)
>>
>> If the user explicitly asks for a NULL/0/empty/whatever suffix,
>> they're not really going to be confusing themselves, right?
>
>
> If the user asks for a blank suffix and you still give back ".x" or ".y"  as
> a suffix, then yes that is confusing.

I agree, that is confusing -- where did this happen?

> As a user I would expect that the rule for column names produced by "merge"
> would be simple: the output column name is the concatenation of the input
> column name and the corresponding suffix.

Total agreement here.

> When I use 'merge" I don't expect
> a more complicated behavior that somehow still uses '.x' even though I asked
> it not to, as in your second example. So I would say that the new behavior
> is more consistent.

But it didn't "still use '.x'" ... it didn't do anything.

There was a column name in the original table that ended with '.x' and
it wasn't changed since the call to merge asked for a blank suffix.
These were the two data.frames, for reference:

d1 <- data.frame(a=letters[1:10], b=rnorm(10), b.x=tail(letters, 10))
d2 <- data.frame(a=letters[1:10], b=101:110)

If you had those two data.frames, and you did this:

merge(d1, d2, by='a', suffixes=c("", ".y")

How is the following result surprising?

   a           b b.x b.y
1  a -1.52250626   q 101
2  b -0.99865341   r 102
...

> When I write functions that use "merge" on general data frames, I can
> anticipate and use the simpler rule, but it is difficult to anticipate the
> results of the more complicated rule in a way that my subsequent lines of
> code will work.
>
> If the inputs I give to merge are inconsistent with the simple rule

I agree that the rule should be simple. I'm not sure why asking for a
blank ("") suffix somehow isn't simple.

> I would
> much rather have an exception (highlighting exactly where my code has gone
> wrong) than a surprising column name change (which makes my code
> mysteriously fail ten or a hundred lines later).

What was "the surprising name change" you are referring to?

-steve

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact