[Rd] Randomness not due to seed

Mon Jul 25 18:49:55 CEST 2011

On Tue, Jul 19, 2011 at 8:13 AM, jeroen00ms <jeroen.ooms at stat.ucla.edu> wrote:
> I am working on a reproducible computing platform for which I would like to
> be able to _exactly_ reproduce an R object. However, I am experiencing
> unexpected randomness in some calculations. I have a hard time finding out
> exactly how it occurs. The code below illustrates the issue.
>
> mylm1 <- lm(dist~speed, data=cars);
> mylm2 <- lm(dist~speed, data=cars);
> identical(mylm1, mylm2); #TRUE
>
> makelm <- function(){
>        return(lm(dist~speed, data=cars));
> }
>
> mylm1 <- makelm();
> mylm2 <- makelm();
> identical(mylm1, mylm2); #FALSE
>
> When inspecting both objects there seem to be some rounding differences.
> Setting a seed does not make a difference. Is there any way I can remove
> this randomness and exactly reproduce the object every time?
>

William Dunlap was correct.  Observe in the sequence of comparisons
below, the difference in the "terms" object is causing the identical
to fail: Everything else associated with this model--the coefficients,
the r-square, cov matrix, etc, exactly match.

> mylm1 <- lm(dist~speed, data=cars);
> mylm2 <- lm(dist~speed, data=cars);
> identical(mylm1, mylm2); #TRUE
[1] TRUE
> makelm <- function(){
+        return(lm(dist~speed, data=cars));
+ }
> mylm1 <- makelm();
> mylm2 <- makelm();
> identical(mylm1, mylm2); #FALSE
[1] FALSE
> identical(coef(mylm1), coef(mylm2))
[1] TRUE
> identical(summary(mylm1), summary(mylm2))
[1] FALSE
> identical(coef(summary(mylm1)), coef(summary(mylm2)))
[1] TRUE
> all.equal(mylm1, mylm2)
[1] TRUE
> identical(summary(mylm1)$r.squared, summary(mylm2)$r.squared)
[1] TRUE
> identical(summary(mylm1)$adj.r.squared, summary(mylm2)$adj.r.squared)
[1] TRUE
> identical(summary(mylm1)$sigma, summary(mylm2)$sigma)
[1] TRUE
> identical(summary(mylm1)$fstatistic, summary(mylm2)$fstatistic)
[1] TRUE
> identical(summary(mylm1)$residuals, summary(mylm2)$residuals)
[1] TRUE
> identical(summary(mylm1)$cov.unscaled, summary(mylm2)$cov.unscaled)
[1] TRUE
> identical(summary(mylm1)$call, summary(mylm2)$call)
[1] TRUE
> identical(summary(mylm1)$terms, summary(mylm2)$terms)
[1] FALSE

> summary(mylm2)$terms
dist ~ speed
attr(,"variables")
list(dist, speed)
attr(,"factors")
      speed
dist      0
speed     1
attr(,"term.labels")
[1] "speed"
attr(,"order")
[1] 1
attr(,"intercept")
[1] 1
attr(,"response")
[1] 1
attr(,".Environment")
<environment: 0x1b76ae0>
attr(,"predvars")
list(dist, speed)
attr(,"dataClasses")
     dist     speed
"numeric" "numeric"
>
> summary(mylm1)$terms
dist ~ speed
attr(,"variables")
list(dist, speed)
attr(,"factors")
      speed
dist      0
speed     1
attr(,"term.labels")
[1] "speed"
attr(,"order")
[1] 1
attr(,"intercept")
[1] 1
attr(,"response")
[1] 1
attr(,".Environment")
<environment: 0x1cf06b8>
attr(,"predvars")
list(dist, speed)
attr(,"dataClasses")
     dist     speed
"numeric" "numeric"

-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas