[Rd] PR#9299:Re: Bugs with partial name matching during partial replacement (PR#9299)
Thomas Lumley
tlumley at u.washington.edu
Mon Oct 16 21:54:53 CEST 2006
On Mon, 16 Oct 2006, hin-tak.leung at cimr.cam.ac.uk wrote:
> This is a rather interesting, but I don't think it is a bug - it is
> just things that "you are not supposed to do"
It was a bug. It has been fixed in R 2.4.0. Unfortunately, since you
didn't quote the PR# of the original bug in the subject line you have just
filed a new bug report for it.
-thomas
> ... you are assuming
> a certain evaluation order of the 4 "$" operators in
> " D$ABC[D$M] = D$V[D$M] " as in:
>
> temp1 <- D$M # 2nd and 4th
> temp2 <- D$V[temp1] # 3rd
> D$ABC[temp1] = temp2 # 1st
>
> What R did was this:
>
> temp4 <- D$ABC # make reference, expand to D$ABCD , 1st
> temp1 <- D$M # 2nd, and 4th
> temp2 <- D$V[temp1] # 3rd
>
> temp4[temp1] <- temp2 # oh dear, it looks as if we are
> D$ABC <- temp4 # trying to write to a reference,
> # better make a copy instead
>
> R is doing the 4 $'s roughly from left to right, if you have some ideas
> how R works inside. (I am not saying this behavior is a "good" thing,
> but at least it is consistent). Basically it is a very bad habit to
> write code that depends on evaluation order of operators at the same
> precendence.
>
> The difference in behavior in the two case is probably due to
> coercion, (and also how lazy R does make-a-reference versus "oops, you
> seems to try to write to a reference so I better copy it") but
> I'll leave you to think about what order R is doing the combination of
> the 4 $'s and coercing between types... Basically writing code that
> depends on evaluation order is a bad idea.
>
> c.f. this bit of C code:
>
> i =0;
> ++i = ++i + ++i;
>
> what value do you think "i" should be?
>
> amaliy1 at uic.edu wrote:
>> Hello,
>>
>> First the version info:
>> platform powerpc-apple-darwin8.6.0
>> arch powerpc
>> os darwin8.6.0
>> system powerpc, darwin8.6.0
>> status
>> major 2
>> minor 3.1
>> year 2006
>> month 06
>> day 01
>> svn rev 38247
>> language R
>> version.string Version 2.3.1 (2006-06-01)
>>
>> I have encountered some unusual behavior when trying to create new
>> columns in a data frame that have names that would generate a partial
>> match with an existing column with a longer name. It is my
>> understanding that replacement operations shouldn't have partial
>> matching, but it is not clear to me whether this applies only when
>> the named column exists and not for new assignments.
>>
>> The first example:
>>
>> > D = data.frame(M=c(T,T,F,F,F,T,F,T,F,F,T,T,T),V=I(sprintf("ZZ%02d",
>> 1:13)),ABCD=13:1)
>> > D
>> M V ABCD
>> 1 TRUE ZZ01 13
>> 2 TRUE ZZ02 12
>> 3 FALSE ZZ03 11
>> 4 FALSE ZZ04 10
>> 5 FALSE ZZ05 9
>> 6 TRUE ZZ06 8
>> 7 FALSE ZZ07 7
>> 8 TRUE ZZ08 6
>> 9 FALSE ZZ09 5
>> 10 FALSE ZZ10 4
>> 11 TRUE ZZ11 3
>> 12 TRUE ZZ12 2
>> 13 TRUE ZZ13 1
>> > D$CBA[D$M] = D$V[D$M]
>> > D
>> M V ABCD CBA
>> 1 TRUE ZZ01 13 ZZ01
>> 2 TRUE ZZ02 12 ZZ02
>> 3 FALSE ZZ03 11 <NA>
>> 4 FALSE ZZ04 10 <NA>
>> 5 FALSE ZZ05 9 <NA>
>> 6 TRUE ZZ06 8 ZZ06
>> 7 FALSE ZZ07 7 <NA>
>> 8 TRUE ZZ08 6 ZZ08
>> 9 FALSE ZZ09 5 <NA>
>> 10 FALSE ZZ10 4 <NA>
>> 11 TRUE ZZ11 3 ZZ11
>> 12 TRUE ZZ12 2 ZZ12
>> 13 TRUE ZZ13 1 ZZ13
>> > D$ABC[D$M] = D$V[D$M]
>> > D
>> M V ABCD CBA ABC
>> 1 TRUE ZZ01 13 ZZ01 ZZ01
>> 2 TRUE ZZ02 12 ZZ02 ZZ02
>> 3 FALSE ZZ03 11 <NA> 11
>> 4 FALSE ZZ04 10 <NA> 10
>> 5 FALSE ZZ05 9 <NA> 9
>> 6 TRUE ZZ06 8 ZZ06 ZZ06
>> 7 FALSE ZZ07 7 <NA> 7
>> 8 TRUE ZZ08 6 ZZ08 ZZ08
>> 9 FALSE ZZ09 5 <NA> 5
>> 10 FALSE ZZ10 4 <NA> 4
>> 11 TRUE ZZ11 3 ZZ11 ZZ11
>> 12 TRUE ZZ12 2 ZZ12 ZZ12
>> 13 TRUE ZZ13 1 ZZ13 ZZ13
>>
>> I expected ABC to equal CBA with NA values in rows not assigned, but
>> instead it appears that an extraction from D$ABCD and coercion to
>> string is being performed in the process of creating D$ABC.
>>
>> Here is something I believe is definitely a bug:
>>
>> > D = data.frame(M=c(T,T,F,F,F,T,F,T,F,F,T,T,T),V=1:13,ABCD=13:1)
>> > D
>> M V ABCD
>> 1 TRUE 1 13
>> 2 TRUE 2 12
>> 3 FALSE 3 11
>> 4 FALSE 4 10
>> 5 FALSE 5 9
>> 6 TRUE 6 8
>> 7 FALSE 7 7
>> 8 TRUE 8 6
>> 9 FALSE 9 5
>> 10 FALSE 10 4
>> 11 TRUE 11 3
>> 12 TRUE 12 2
>> 13 TRUE 13 1
>> > D$CBA[D$M] = D$V[D$M]
>> > D
>> M V ABCD CBA
>> 1 TRUE 1 13 1
>> 2 TRUE 2 12 2
>> 3 FALSE 3 11 NA
>> 4 FALSE 4 10 NA
>> 5 FALSE 5 9 NA
>> 6 TRUE 6 8 6
>> 7 FALSE 7 7 NA
>> 8 TRUE 8 6 8
>> 9 FALSE 9 5 NA
>> 10 FALSE 10 4 NA
>> 11 TRUE 11 3 11
>> 12 TRUE 12 2 12
>> 13 TRUE 13 1 13
>> > D$ABC[D$M] = D$V[D$M]
>> > D
>> M V ABCD CBA ABC
>> 1 TRUE 1 1 1 1
>> 2 TRUE 2 2 2 2
>> 3 FALSE 3 11 NA 11
>> 4 FALSE 4 10 NA 10
>> 5 FALSE 5 9 NA 9
>> 6 TRUE 6 6 6 6
>> 7 FALSE 7 7 NA 7
>> 8 TRUE 8 8 8 8
>> 9 FALSE 9 5 NA 5
>> 10 FALSE 10 4 NA 4
>> 11 TRUE 11 11 11 11
>> 12 TRUE 12 12 12 12
>> 13 TRUE 13 13 13 13
>>
>> ABC is created as before with valued from ABCD in the unassigned
>> rows, but ABCD is being modified as well. The only difference form
>> the previous example is that V is now just a numeric column.
>>
>> Anil Maliyekkel
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
Thomas Lumley Assoc. Professor, Biostatistics
tlumley at u.washington.edu University of Washington, Seattle
More information about the R-devel
mailing list