[Rd] reference counting problem in .Primitive's?

Fri Apr 24 00:44:19 CEST 2009

On Thu, 23 Apr 2009, William Dunlap wrote:

>> -----Original Message-----
>> From: luke at stat.uiowa.edu [mailto:luke at stat.uiowa.edu]
>> Sent: Thursday, April 23, 2009 11:06 AM
>> To: William Dunlap
>> Cc: r-devel at r-project.org
>> Subject: Re: [Rd] reference counting problem in .Primitive's?
>>
>> On Thu, 23 Apr 2009, William Dunlap wrote:
>>
>>> I think the following rather wierd expressions show a problem in how
>>> some of the .Primitive functions evaluate their arguments.
>> I haven't
>>> yet thought of a way that a nonabusive user might run into
>> this problem.
>>> In each case the first argument, x, is modified in the course of
>>> evaluating the second argument and then modified x gets used
>>> as the first argument:
>>>
>>>> x<-as.integer(1:5); y <- x + { x[3]<-33L ; 1L } ; y
>>> [1]  2  3 34  5  6
>>>> x<-2^(0:4) ; y <- log(x, { x[3]<-64 ; 2 }) ; y
>>> [1] 0 1 6 3 4
>>>
>>> The reason I think it looks like a sharing problem (and not an order
>>> of evaluation problem) is that if your modification to x
>> causes it to
>>> use a new block of memory then the unmodified version of x gets
>>> used as the first argument.  E.g.,
>>>
>>>> x<-as.integer(1:5) ; y <- x + { x[3]<-33.3; 1L} ; y
>>> [1] 2 3 4 5 6
>>>
>>> I haven't yet thought of a way that a nonabusive user might run
>>> into this problem.
>
> An hour after writing this one of our support folks sent me some
> user-written code that contained something very close to this idiom;
> the second argument to ":" is an altered version of the first argument:
>
>   lengths<-5:1 ; start<-1
>   for(i in seq(along=lengths)) {
>        thisSeq <- start:((start <- start + lengths[i])-1)
>        print(thisSeq)
>   }
>   [1] 1 2 3 4 5
>   [1] 6 7 8 9
>   [1] 10 11 12
>   [1] 13 14
>   [1] 15
>
> That works.  However, if that user had also used 'start[] <- ' instead
> of 'start <- ' then they would have run into this bug:
>
>  lengths<-5:1 ; start<-1
>  for(i in seq(along=lengths)) {
>        thisSeq <- start:((start[] <- start + lengths[i])-1)
>        print(thisSeq)
>  }
>  [1] 1 2 3 4 5
>  [1] 10  9
>  [1] 13 12
>  [1] 15 14
>  [1] 16 15
>
> If they use start[] or start[1] consistently in the call to ":" then
> they
> don't hit the bug.

Unless you know of somewhere where it is guaranteed that evaluation
order for : is left to right then this code is buggy.  (At one point I
either had or serously thought about having codetools warn about
assignments in arguments other than in a very limited number of
cases.)

As I said previously unless I can convince myself that the current
behavior isn't consistent with _some_ evaluation order in each case
(even if it changes with changes in expressions used) then I don't
think it is worth doing anything about other than explicitly stating
that evaluation order is undefined.

>
>>
>> You are probably right.  I have not yet looked at the code but am
>> virtually certain it does not try to temporarily bump up the NAMED
>> values on argument values.  Doing so would cure this but probably at
>> serious cost to performance, as NAMED values of 2 cannot be brought
>> down again and so cause copying on next modify. (Might be worth
>> running some tests on that though to see what the cost would be).
>
> So, if NAMED were not limited to 0,1,or 2 this sort of thing might be
> avoided with less pain?

If we had full reference counting I think we could avoid this fairly
easily, but I'm not convinced it is worth avoiding as there are good
reasons to allow indeterminacy in order of evaluation (compiler
optimizations, parallelization, and such) and in any case going to
full reference counting is not realistic without a full rewrite of the
engine (and has its own potential performance issues).

luke

>> I'm not sure if it is written anywhere that argunments of primitives
>> (BUILTINS in articular as those are always strict; SPECIALS can be
>> non-strict but log is strict) are evaluated in any particular order.
>> All these examples are consistent with _some_ evaluation order, but
>> not the same one.  It might be possible to show that the results
>> obtained in these situations will always be consistent with some
>> evaluation order, in which case documenting that order of evaluation
>> is unspecified would be good enough form me.  It may also be possible
>> that an order that does compound expressions first and then symbols
>> would also solve the issue (I don't think I would want to do this in
>> the interpreter though because of the performance overhead.)
>>
>> luke
>>
>>
>>>
>>> Bill Dunlap
>>> TIBCO Software Inc - Spotfire Division
>>> wdunlap tibco.com
>>>
>>> ______________________________________________
>>> R-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>
>>
>> --
>> Luke Tierney
>> Chair, Statistics and Actuarial Science
>> Ralph E. Wareham Professor of Mathematical Sciences
>> University of Iowa                  Phone:             319-335-3386
>> Department of Statistics and        Fax:               319-335-3017
>>     Actuarial Science
>> 241 Schaeffer Hall                  email:      luke at stat.uiowa.edu
>> Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu
>>
>

-- 
Luke Tierney
Chair, Statistics and Actuarial Science
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa                  Phone:             319-335-3386
Department of Statistics and        Fax:               319-335-3017
    Actuarial Science
241 Schaeffer Hall                  email:      luke at stat.uiowa.edu
Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu