[R] gam (mgcv), multiple imputation, f-stats/p-values, and summary(gam)

Wed May 8 18:33:16 CEST 2013

For smooths the method is described in

Wood 2013 On p-values for smooth components of an extended generalized 
additive model, Biometrika 100(1),221-228

http://biomet.oxfordjournals.org/content/early/2012/10/18/biomet.ass048.full.pdf+html

best,
Simon

On 08/05/13 15:12, Andrew Crane-Droesch wrote:
> Dear All,
>
> I'm using gam for a project that involves multiple imputation, and it
> has led me to a question about how f-statistics/p-values work in gam.
> Specifically, how do the values in summary(gam) get generated?  As is
> made clear by the dumb example below, I'm manipul;ating gam objects to
> reflect the MI procedures that I'm using.  I don't trust the
> f-statistics/p-values that I'm getting, but I'd like to know how to
> further manipulate these objects to get trustworthy values.  Part of
> that invovles knowing how the output in summary(gam) gets generated, and
> from what elements of a gam object.
>
> Here is the example:
>
> library(mgcv)
> par(mfrow=c(2,2))
> impute <- function (a, a.impute){
>      ifelse (is.na(a), a.impute, a)
>      }
>
> #make some dumb fake data
> x1.knots = c(-1,-.5,0,.5,1)
> x2.knots = c(-1,-.5,0,.5,1)
> K=5
> x1 = rnorm(100)
> x2 = rnorm(100)
> y = .05*exp(x1)-.5*x1 + .1*x2 + x1*x2 + rnorm(100)
> #some of it is missing
> x1[81:100] = NA
> x2[71:90] = NA
>
> #do a few dumb imputations, and fit models to the partially-imputed data
> x1.imp = impute(x1, rnorm(100))
> x2.imp = impute(x2, rnorm(100))
> m1 = gam(y~te(x1.imp,x2.imp,k=c(K,K)),knots = list(x1.imp = x1.knots,
> x2.imp = x2.knots))
> plot(m1,plot.type="contour",scheme=2,main="Imp 1")
>
> x1.imp = impute(x1, rnorm(100))
> x2.imp = impute(x2, rnorm(100))
> m2 = gam(y~te(x1.imp,x2.imp,k=c(K,K)),knots = list(x1.imp = x1.knots,
> x2.imp = x2.knots))
> plot(m2,plot.type="contour",scheme=2,main="Imp 2")
>
> x1.imp = impute(x1, rnorm(100))
> x2.imp = impute(x2, rnorm(100))
> m3 = gam(y~te(x1.imp,x2.imp,k=c(K,K)),knots = list(x1.imp = x1.knots,
> x2.imp = x2.knots))
> plot(m3,plot.type="contour",scheme=2,main="Imp 3")
>
> results = list(m1,m2,m3)
>
> #Combine the results according to rubin's rules
> reps=3
> bhat=results[[1]]$coeff
> for (i in 2:reps){
>      bhat=bhat+results[[i]]$coeff
>      }
> bhat = bhat/reps
>
> W=results[[1]]$Vp
> for (i in 2:reps){
>      W = W+results[[i]]$Vp
>      }
> W = W/reps
> B= (results[[1]]$coeff-bhat) %*% t(results[[1]]$coeff-bhat)
> for (i in 2:reps){
>      B = B+(results[[i]]$coeff-bhat) %*% t(results[[i]]$coeff-bhat)
>      }
> B=B/(reps-1)
> Vb = W+(1+1/reps)*B
>
> #Put the results into a convenient gam object
> MI = results[[1]]
> MI$coefficients=bhat
> MI$Vp = Vb
>
> #and summarize
> summary(MI)
> plot(MI,plot.type="contour",scheme=2,main="MI")
>
> When I do something similar with non-fake data I get wacky f-statistics
> and p-values.  For example, F could be >100 and p could equal 1.  This
> probably has something to do with degrees of freedom.
>
> My easy question:  what goes on under the hood with gam to generate
> these values?  What parts of a gam object are called up?
>
> My harder question:  how might one construct principled analogs of these
> statistics in an MI context, when degrees of freedom will vary across
> models, depending on the imputed data?  Has anybody thought about this,
> or do I have have some serious pencil-and-paper ahead of me?
>
> Thanks,
> Andrew
>
>
>
>


-- 
Simon Wood, Mathematical Science, University of Bath BA2 7AY UK
+44 (0)1225 386603               http://people.bath.ac.uk/sw283