[R] mgcv: how select significant predictor vars when using gam(...select=TRUE) using automatic optimization

Wed Apr 17 16:50:11 CEST 2013

I have 11 possible predictor variables and use them to model quite a few
target variables. 
In search for a consistent manner and possibly non-manual manner to identify
the significant predictor vars out of the eleven I thought the option
"select=T" might do.

Example: (here only 4 pedictors) 
first is vanilla with "select=F"

> fit1<-gam(target~s(mgs)+s(gsd)+s(mud)+s(ssCmax),family=quasi(link=log),data=wspe1,select=F)
> summary(fit1)

Family: quasi 
Link function: log 
Formula:
target ~ s(mgs) + s(gsd) + s(mud) + s(ssCmax)
Parametric coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept)   -34.57      20.47  -1.689   0.0913 .
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 
Approximate significance of smooth terms:
            edf Ref.df      F  p-value    
s(mgs)    2.335  2.623  0.260    0.829    
s(gsd)    6.868  7.506 13.955  < 2e-16 ***
s(mud)    8.990  9.000 11.727  < 2e-16 ***
s(ssCmax) 6.770  6.978  6.664 7.68e-08 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

R-sq.(adj) =  0.402   Deviance explained = 40.4%
GCV score = 8.8563e+05  Scale est. = 8.8053e+05  n = 4511

then turn select=TRUE

fit2<-gam(target~s(mgs)+s(gsd)+s(mud)+s(ssCmax),family=quasi(link=log),data=wspe1,select=TRUE)
> summary(fit2)

Family: quasi 
Link function: log 

Formula:
target ~ s(mgs) + s(gsd) + s(mud) + s(ssCmax)
Parametric coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)   0.1585     1.7439   0.091    0.928
Approximate significance of smooth terms:
            edf Ref.df     F p-value    
s(mgs)    2.456      8 24.50  <2e-16 ***
s(gsd)    7.272      9 14.33  <2e-16 ***
s(mud)    7.678      9 20.38  <2e-16 ***
s(ssCmax) 6.556      9 14.36  <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 
R-sq.(adj) =  0.397   Deviance explained =   40%
GCV score = 8.9209e+05  Scale est. = 8.8715e+05  n = 4511

I seem to not fully understand how to work with "select".
The predictor "mgs" is obviously not significant, as seen from "fit"
(above), yet here it appears as significant. Why was it not dropped? How are
not-significant predictors are identified? 

--
View this message in context: http://r.789695.n4.nabble.com/mgcv-how-select-significant-predictor-vars-when-using-gam-select-TRUE-using-automatic-optimization-tp4664510.html
Sent from the R help mailing list archive at Nabble.com.