[R] A reproducibility puzzle with NORM

Fri Sep 21 17:57:44 CEST 2007

On Fri, 21 Sep 2007, Ted.Harding at manchester.ac.uk wrote:

> On 21-Sep-07 13:44:40, Prof Brian Ripley wrote:
>> Norm uses a Box-Muller normal RNG, and rngseed does not reset
>> its state (it has some Fortran save variables).  So if you ask
>> for an odd number of normals and call rngseed, the next normal
>> 'generated' is the second half of the last pair with the
>> previous seed.
>>
>> Ideally packages should be converted to use R's number generators
>> which do not have such quirks.
>
> Many thanks! That is indeed a deeply hidden quirk.
> (It should be documented!)
>
> Presumably it applies also to 'mix'? I'm even wondering if
> it might apply to 'cat' (which I'm supposed to be a maintainer
> of); in principle it should not, since there's no unavoidable
> necessity to use normal RNs there, hence no need for a
> Box-Muller RNG; but one never knows. I'd better look!
>
> Trying to think of a work-round for immediate purposes,
> but there seems to be nothing obvious which would avoid
> the problem (e.g. testing for odd/even in number of
> imputations), since one cannot be sure of how many normal
> RNs have been called for during the DA and Imputation
> steps.
>
> Maybe one can add a bit of code to rngseed() to administer
> a salutary kick to something.

Unfortunately not, as the variable is stored internally to another Fortran 
function and so not accessible to rngseed.  It's easy in C to have static 
variables shared between functions, which is how R solves this.

I only put package mix together to meet the needs of one of my students, 
and put it on CRAN when someone else asked for it.  If I had an ongoing 
need for it I would certainly make it a lot more robust.

>
> Thanks!
> Ted.
>
>>
>> On Fri, 21 Sep 2007, Ted.Harding at manchester.ac.uk wrote:
>>
>>> Hi Folks,
>>> I'm using the 'norm' package (based on Shafer's NORM)
>>> on some  data. In outline, (X,Y) are bivariate normal,
>>> var(X)=0.29, var(Y)=24.4, cov(X,Y)=-0.277,
>>> there are some 900 cases, and some 170 values of Y
>>> have been set "missing" (NA).
>>>
>>> The puzzle is that, repeating the multiple imputation
>>> starting from the same random seed, I get different
>>> answers from the repeats depending if I do an odd number
>>> of imputations, but the same answer on the repeats
>>> if I do en even number (which includes the second
>>> repeat of an odd number).
>>>
>>> It may possibly have something to do with how I've
>>> written the code for the loop, but if so then I'm
>>> not seeing it!
>>>
>>> CODE:
>>>
>>> ## Set up the situation:
>>> Data<-read.csv("MyData.csv")
>>> X<-Data$X; Y<-Data$Y
>>> ##(If you want to try it, set your own data here)
>>> Raw<-cbind(X,Y)
>>> library(norm)
>>>
>>> ## Initialise stuff
>>> s<-prelim.norm(Raw)
>>> t0<-em.norm(s)
>>>
>>> ##########################
>>> ## Set the Random Seed
>>> rngseed(31425)
>>>
>>> ## Do the first imputation:
>>> t     <-  da.norm(s,t0,steps=20)
>>> Imp   <- imp.norm(s,t, Raw)
>>> X.Imp <- Imp[,1]; Y.Imp<-Imp[,2]
>>>
>>> ## Now do the rest, and accumulate lists of the results
>>> ## Est.Imp = list of estimated coeffs
>>> ## SE.Imp  = list of SEs of estimated coeffs:
>>> Est.Imp <- list(summary(lm(Y.Imp~X.Imp))$coef[,1])
>>> SE.Imp  <- list(summary(lm(Y.Imp~X.Imp))$coef[,2])
>>> N=4
>>> for(i in (2:N)){
>>>  t<-da.norm(s,t,steps=20)
>>>  Imp<-imp.norm(s,t,Raw)
>>>  X.Imp<-Imp[,1]; Y.Imp<-Imp[,2]
>>>  Est.Imp<-c(Est.Imp,list(summary(lm(Y.Imp~X.Imp))$coef[,1]))
>>>  SE.Imp <-c( SE.Imp,list(summary(lm(Y.Imp~X.Imp))$coef[,2]))
>>> }
>>>
>>> ## Finally, combine the imputations:
>>> mi.inference(Est.Imp,SE.Imp)
>>>
>>>
>>> I've illustrated N=4 (even) above, for 4 imputations.
>>>
>>> Now, I run the code repeatedly from "## Set the Random Seed"
>>> down to "mi.inference(Est.Imp,SE.Imp)"
>>>
>>> With N=4, I always get the same result.
>>>
>>> If I set N=5, I alternately get different results:
>>> The second run is different from the first, but the
>>> third is the same as the first, and the fourth is the
>>> same as the second, ...
>>>
>>> In general, for even N, it is as for N=4, and for odd N
>>> it is as for N=5.
>>>
>>> Any ideas???
>>>
>>> Thanks,
>>> Ted.
>>>
>>> --------------------------------------------------------------------
>>> E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
>>> Fax-to-email: +44 (0)870 094 0861
>>> Date: 21-Sep-07                                       Time: 14:13:27
>>> ------------------------------ XFMail ------------------------------
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>> --
>> Brian D. Ripley,                  ripley at stats.ox.ac.uk
>> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
>> University of Oxford,             Tel:  +44 1865 272861 (self)
>> 1 South Parks Road,                     +44 1865 272866 (PA)
>> Oxford OX1 3TG, UK                Fax:  +44 1865 272595
>
> --------------------------------------------------------------------
> E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
> Fax-to-email: +44 (0)870 094 0861
> Date: 21-Sep-07                                       Time: 15:12:27
> ------------------------------ XFMail ------------------------------
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595