[R] Issue of reproducibility with gam and lm.wfit in different versions of R

Sebastien Bihorel sebastien.bihorel at cognigencorp.com
Tue Oct 24 03:57:52 CEST 2017


Dear R users, 

I recently stumbled upon problems of reproducibility while running GAM analyses in different R and gam package versions. In the example below, a small dataset is created in which the y and x1 variables are 100% correlated. The intents of this example were primarily for regression testing and, secondarily, to evaluate how the gam algorithm behaves under extreme/limit conditions. 

I ran this little snippet in 5 different environments and got 100% consistent results until I switched to R 3.3.2. 
* Comparing results from environments 1, 2, and 3 shows that changing the version of the gam package did not change the results under R 3.3.0. 
* Comparing results from environments 3 and 4 shows that changing the version of R altered the values of the AIC and the output of the step.gam call (changed to a NULL object) 
* Comparing results from environments 4 and 5 shows that reverting to an older version of the gam package in R 3.3.2 still produced altered AIC values and the NULL output from step.gam call 

Further investigations into these differences seem to show that the lm.wfit call in the gam.fit function (called from within gam and step.gam) may result in different values in R 3.3.0 and 3.3.2. 

So my questions are 2-fold: 
1- Would you have any information about why lm.wfit would produce different outcomes in R 3.3.0 and 3.3.2? The source code does not appear significantly different.
2- Is it expected that the step.gam function returns a NULL object in R 3.3.2 while a valid model has been identified? Looking at the source of step.gam, the line 157 (if(is.null(form.list)) break) seems to be the reason the function breaks out and returns a NULL value. 

I thank you in advance for your time. 

Sebastien Bihorel 






library(gam) 

dat <- data.frame( 
y = c(57,57,98,83,122,69,108,86,80,87,75,76,97,101,121,111,105,84,65,54,61,71,125,60,50,112,102,110,77,45,93,62,120,115,70,113,117,85,46,123,89,95,116,55,110,92,109,100,72,88,105,119,94,45,67,58,60,45,107,73,100,79,47,99,51,53,68,125,90,48,82,85,65,52,70,59,125,49,118,103,91,124,78,81,63,63), 
x1 = c(52,52,93,78,117,64,103,81,75,82,70,71,92,96,116,106,100,79,60,49,56,66,120,55,45,107,97,105,72,40,88,57,115,110,65,108,112,80,41,118,84,90,111,50,105,87,104,95,67,83,100,114,89,40,62,53,55,40,102,68,95,74,42,94,46,48,63,120,85,43,77,80,60,47,65,54,120,44,113,98,86,119,73,76,58,58), 
x2 = c(0.0001,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.0001,0) 
) 

summary(dat) 

scope <- list( 
x1=c('1','x1','ns(x1, df=2)'), 
x2=c('1','x2') 
) 

gam.object <- gam(y~1, data=dat) 
step.object <- step.gam(gam.object, scope=scope, trace=2) 
is.null(step.object) 

# Run this if you want to evaluate one lm.wfit example call made from within gam functions
x <- cbind(rep(1,nrow(dat)), dat$x1)
names(x) <- c('Intercept', 'x1')
z <- dat$y
w <- rep(1,nrow(dat))
str(eval(expression(lm.wfit(x, z, w, method = "qr", singular.ok = TRUE))))





###################################################################### 
#1 R 3.1.2 (x86_64-redhat-linux-gnu (64-bit)) / gam 1.09.1 
###################################################################### 
> step.object <- step.gam(gam.object, scope=scope, trace=2) 
Start: y ~ 1; AIC= 797.0034 
Trial: y ~ x1 + 1 ; AIC= -5121.796 
Trial: y ~ 1 + x2 ; AIC= 796.915 
Step:1 y ~ x1 ; AIC= -5121.796 
Trial: y ~ ns(x1, df=2) + 1 ; AIC= -5123.816 
Trial: y ~ x1 + x2 ; AIC= -5174.408 
Step:2 y ~ x1 + x2 ; AIC= -5174.408 
Trial: y ~ ns(x1, df=2) + x2 ; AIC= -5164.829 
> is.null(step.object) 
[1] FALSE 

###################################################################### 
#2: R 3.3.0 (x86_64-redhat-linux-gnu (64-bit)) / gam 1.12 
###################################################################### 
> step.object <- step.gam(gam.object, scope=scope, trace=2) 
Start: y ~ 1; AIC= 797.0034 
Trial: y ~ x1 + 1 ; AIC= -5121.796 
Trial: y ~ 1 + x2 ; AIC= 796.915 
Step:1 y ~ x1 ; AIC= -5121.796 
Trial: y ~ ns(x1, df=2) + 1 ; AIC= -5123.816 
Trial: y ~ x1 + x2 ; AIC= -5174.408 
Step:2 y ~ x1 + x2 ; AIC= -5174.408 
Trial: y ~ ns(x1, df=2) + x2 ; AIC= -5164.829 
> is.null(step.object) 
[1] FALSE 

###################################################################### 
#3: R 3.3.0 (x86_64-redhat-linux-gnu (64-bit)) / gam 1.14-4 
###################################################################### 
> step.object <- step.gam(gam.object, scope=scope, trace=2) 
Start: y ~ 1; AIC= 797.0034 
Trial: y ~ x1 + 1 ; AIC= -5121.796 
Trial: y ~ 1 + x2 ; AIC= 796.915 
Step:1 y ~ x1 ; AIC= -5121.796 
Trial: y ~ ns(x1, df=2) + 1 ; AIC= -5123.816 
Trial: y ~ x1 + x2 ; AIC= -5174.408 
Step:2 y ~ x1 + x2 ; AIC= -5174.408 
Trial: y ~ ns(x1, df=2) + x2 ; AIC= -5164.829 
> is.null(step.object) 
[1] FALSE 

###################################################################### 
#4: R 3.3.2 (x86_64-redhat-linux-gnu (64-bit)) / gam 1.14-4 
###################################################################### 
> step.object <- step.gam(gam.object, scope=scope, trace=2) 
Start: y ~ 1; AIC= 797.0034 
Trial: y ~ x1 + 1 ; AIC= -5122.886 
Trial: y ~ 1 + x2 ; AIC= 796.915 
Step:1 y ~ x1 ; AIC= -5122.886 
Trial: y ~ ns(x1, df=2) + 1 ; AIC= -5177.531 
Trial: y ~ x1 + x2 ; AIC= -5177.531 
Step:2 y ~ ns(x1, df = 2) ; AIC= -5177.531 
Trial: y ~ ns(x1, df=2) + x2 ; AIC= -5222.703 
Step:3 y ~ ns(x1, df = 2) + x2 ; AIC= -5222.703 
> is.null(step.object) 
[1] TRUE 

###################################################################### 
#5: R 3.3.2 (x86_64-redhat-linux-gnu (64-bit)) / gam 1.09.1 
###################################################################### 
> step.object <- step.gam(gam.object, scope=scope, trace=2) 
Start: y ~ 1; AIC= 797.0034 
Trial: y ~ x1 + 1 ; AIC= -5122.886 
Trial: y ~ 1 + x2 ; AIC= 796.915 
Step:1 y ~ x1 ; AIC= -5122.886 
Trial: y ~ ns(x1, df=2) + 1 ; AIC= -5177.531 
Trial: y ~ x1 + x2 ; AIC= -5177.531 
Step:2 y ~ ns(x1, df = 2) ; AIC= -5177.531 
Trial: y ~ ns(x1, df=2) + x2 ; AIC= -5222.703 
Step:3 y ~ ns(x1, df = 2) + x2 ; AIC= -5222.703 
> is.null(step.object) 
[1] TRUE



More information about the R-help mailing list