[Rd] merge bug fix in R 2.15.0
Steve Lianoglou
mailinglist.honeypot at gmail.com
Sun Mar 18 21:50:30 CET 2012
Hi,
I'm not sure I follow ... I think we're in total agreement, but it
sounds like you're suggesting we aren't.
On Sun, Mar 18, 2012 at 4:40 PM, Peter Meilstrup
<peter.meilstrup at gmail.com> wrote:
> On Sun, Mar 18, 2012 at 12:48 PM, Steve Lianoglou
> <mailinglist.honeypot at gmail.com> wrote:
[snip]
>> > Right, the user is now protected against confusing himself by using
>> > names
>> > that were not unique before the merge.
>>
>> ... now I'm confused :-)
>>
>> If the user explicitly asks for a NULL/0/empty/whatever suffix,
>> they're not really going to be confusing themselves, right?
>
>
> If the user asks for a blank suffix and you still give back ".x" or ".y" as
> a suffix, then yes that is confusing.
I agree, that is confusing -- where did this happen?
> As a user I would expect that the rule for column names produced by "merge"
> would be simple: the output column name is the concatenation of the input
> column name and the corresponding suffix.
Total agreement here.
> When I use 'merge" I don't expect
> a more complicated behavior that somehow still uses '.x' even though I asked
> it not to, as in your second example. So I would say that the new behavior
> is more consistent.
But it didn't "still use '.x'" ... it didn't do anything.
There was a column name in the original table that ended with '.x' and
it wasn't changed since the call to merge asked for a blank suffix.
These were the two data.frames, for reference:
d1 <- data.frame(a=letters[1:10], b=rnorm(10), b.x=tail(letters, 10))
d2 <- data.frame(a=letters[1:10], b=101:110)
If you had those two data.frames, and you did this:
merge(d1, d2, by='a', suffixes=c("", ".y")
How is the following result surprising?
a b b.x b.y
1 a -1.52250626 q 101
2 b -0.99865341 r 102
...
> When I write functions that use "merge" on general data frames, I can
> anticipate and use the simpler rule, but it is difficult to anticipate the
> results of the more complicated rule in a way that my subsequent lines of
> code will work.
>
> If the inputs I give to merge are inconsistent with the simple rule
I agree that the rule should be simple. I'm not sure why asking for a
blank ("") suffix somehow isn't simple.
> I would
> much rather have an exception (highlighting exactly where my code has gone
> wrong) than a surprising column name change (which makes my code
> mysteriously fail ten or a hundred lines later).
What was "the surprising name change" you are referring to?
-steve
--
Steve Lianoglou
Graduate Student: Computational Systems Biology
| Memorial Sloan-Kettering Cancer Center
| Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact
More information about the R-devel
mailing list