[Rd] PR#9299:Re: Bugs with partial name matching during partial replacement (PR#9299)

Mon Oct 16 21:54:53 CEST 2006

On Mon, 16 Oct 2006, hin-tak.leung at cimr.cam.ac.uk wrote:

> This is a rather interesting, but I don't think it is a bug - it is
> just things that "you are not supposed to do"

It was a bug. It has been fixed in R 2.4.0. Unfortunately, since you 
didn't quote the PR# of the original bug in the subject line you have just 
filed a new bug report for it.

 	-thomas

>	... you are assuming
> a certain evaluation order of the 4 "$" operators in
> " D$ABC[D$M] = D$V[D$M] " as in:
>
> temp1 <- D$M                 # 2nd and 4th
> temp2 <- D$V[temp1]          # 3rd
> D$ABC[temp1] = temp2         # 1st
>
> What R did was this:
>
> temp4 <- D$ABC         # make reference, expand to D$ABCD , 1st
> temp1 <- D$M           # 2nd, and 4th
> temp2 <- D$V[temp1]    # 3rd
>
> temp4[temp1] <- temp2  # oh dear, it looks as if we are
> D$ABC <- temp4         # trying to write to a reference,
>                        # better make a copy instead
>
> R is doing the 4 $'s roughly from left to right, if you have some ideas
> how R works inside. (I am not saying this behavior is a "good" thing,
> but at least it is consistent). Basically it is a very bad habit to
> write code that depends on evaluation order of operators at the same
> precendence.
>
> The difference in behavior in the two case is probably due to
> coercion, (and also how lazy R does make-a-reference versus "oops, you
> seems to try to write to a reference so I better copy it") but
> I'll leave you to think about what order R is doing the combination of
> the 4 $'s and coercing between types... Basically writing code that
> depends on evaluation order is a bad idea.
>
> c.f. this bit of C code:
>
> i =0;
> ++i = ++i + ++i;
>
> what value do you think "i" should be?
>
> amaliy1 at uic.edu wrote:
>> Hello,
>>
>> First the version info:
>> platform       powerpc-apple-darwin8.6.0
>> arch           powerpc
>> os             darwin8.6.0
>> system         powerpc, darwin8.6.0
>> status
>> major          2
>> minor          3.1
>> year           2006
>> month          06
>> day            01
>> svn rev        38247
>> language       R
>> version.string Version 2.3.1 (2006-06-01)
>>
>> I have encountered some unusual behavior when trying to create new
>> columns in a data frame that have names that would generate a partial
>> match with an existing column with a longer name.  It is my
>> understanding that replacement operations shouldn't have partial
>> matching, but it is not clear to me whether this applies only when
>> the named column exists and not for new assignments.
>>
>> The first example:
>>
>> > D = data.frame(M=c(T,T,F,F,F,T,F,T,F,F,T,T,T),V=I(sprintf("ZZ%02d",
>> 1:13)),ABCD=13:1)
>> > D
>>         M    V ABCD
>> 1   TRUE ZZ01   13
>> 2   TRUE ZZ02   12
>> 3  FALSE ZZ03   11
>> 4  FALSE ZZ04   10
>> 5  FALSE ZZ05    9
>> 6   TRUE ZZ06    8
>> 7  FALSE ZZ07    7
>> 8   TRUE ZZ08    6
>> 9  FALSE ZZ09    5
>> 10 FALSE ZZ10    4
>> 11  TRUE ZZ11    3
>> 12  TRUE ZZ12    2
>> 13  TRUE ZZ13    1
>> > D$CBA[D$M] = D$V[D$M]
>> > D
>>         M    V ABCD  CBA
>> 1   TRUE ZZ01   13 ZZ01
>> 2   TRUE ZZ02   12 ZZ02
>> 3  FALSE ZZ03   11 <NA>
>> 4  FALSE ZZ04   10 <NA>
>> 5  FALSE ZZ05    9 <NA>
>> 6   TRUE ZZ06    8 ZZ06
>> 7  FALSE ZZ07    7 <NA>
>> 8   TRUE ZZ08    6 ZZ08
>> 9  FALSE ZZ09    5 <NA>
>> 10 FALSE ZZ10    4 <NA>
>> 11  TRUE ZZ11    3 ZZ11
>> 12  TRUE ZZ12    2 ZZ12
>> 13  TRUE ZZ13    1 ZZ13
>> > D$ABC[D$M] = D$V[D$M]
>> > D
>>         M    V ABCD  CBA  ABC
>> 1   TRUE ZZ01   13 ZZ01 ZZ01
>> 2   TRUE ZZ02   12 ZZ02 ZZ02
>> 3  FALSE ZZ03   11 <NA>   11
>> 4  FALSE ZZ04   10 <NA>   10
>> 5  FALSE ZZ05    9 <NA>    9
>> 6   TRUE ZZ06    8 ZZ06 ZZ06
>> 7  FALSE ZZ07    7 <NA>    7
>> 8   TRUE ZZ08    6 ZZ08 ZZ08
>> 9  FALSE ZZ09    5 <NA>    5
>> 10 FALSE ZZ10    4 <NA>    4
>> 11  TRUE ZZ11    3 ZZ11 ZZ11
>> 12  TRUE ZZ12    2 ZZ12 ZZ12
>> 13  TRUE ZZ13    1 ZZ13 ZZ13
>>
>> I expected ABC to equal CBA with NA values in rows not assigned, but
>> instead it appears that an extraction from D$ABCD and coercion to
>> string is being performed in the process of creating D$ABC.
>>
>> Here is something I believe is definitely a bug:
>>
>> > D = data.frame(M=c(T,T,F,F,F,T,F,T,F,F,T,T,T),V=1:13,ABCD=13:1)
>> > D
>>         M  V ABCD
>> 1   TRUE  1   13
>> 2   TRUE  2   12
>> 3  FALSE  3   11
>> 4  FALSE  4   10
>> 5  FALSE  5    9
>> 6   TRUE  6    8
>> 7  FALSE  7    7
>> 8   TRUE  8    6
>> 9  FALSE  9    5
>> 10 FALSE 10    4
>> 11  TRUE 11    3
>> 12  TRUE 12    2
>> 13  TRUE 13    1
>> > D$CBA[D$M] = D$V[D$M]
>> > D
>>         M  V ABCD CBA
>> 1   TRUE  1   13   1
>> 2   TRUE  2   12   2
>> 3  FALSE  3   11  NA
>> 4  FALSE  4   10  NA
>> 5  FALSE  5    9  NA
>> 6   TRUE  6    8   6
>> 7  FALSE  7    7  NA
>> 8   TRUE  8    6   8
>> 9  FALSE  9    5  NA
>> 10 FALSE 10    4  NA
>> 11  TRUE 11    3  11
>> 12  TRUE 12    2  12
>> 13  TRUE 13    1  13
>> > D$ABC[D$M] = D$V[D$M]
>> > D
>>         M  V ABCD CBA ABC
>> 1   TRUE  1    1   1   1
>> 2   TRUE  2    2   2   2
>> 3  FALSE  3   11  NA  11
>> 4  FALSE  4   10  NA  10
>> 5  FALSE  5    9  NA   9
>> 6   TRUE  6    6   6   6
>> 7  FALSE  7    7  NA   7
>> 8   TRUE  8    8   8   8
>> 9  FALSE  9    5  NA   5
>> 10 FALSE 10    4  NA   4
>> 11  TRUE 11   11  11  11
>> 12  TRUE 12   12  12  12
>> 13  TRUE 13   13  13  13
>>
>> ABC is created as before with valued from ABCD in the unassigned
>> rows, but ABCD is being modified as well.  The only difference form
>> the previous example is that V is now just a numeric column.
>>
>> Anil Maliyekkel
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

Thomas Lumley			Assoc. Professor, Biostatistics
tlumley at u.washington.edu	University of Washington, Seattle