[R] mgcv: how select significant predictor vars when using gam(...select=TRUE) using automatic optimization

Simon Wood s.wood at bath.ac.uk
Thu Apr 18 15:24:38 CEST 2013


Jan,

Thanks for this. Is there any chance that you could send me the data off 
list and I'll try to figure out what is happening? (Under the 
understanding that I'll only use the data for investigating this issue, 
of course).

best,
Simon

on 18/04/13 11:11, Jan Holstein wrote:
> Simon,
>
> thanks for the reply,  I guess I'm pretty much up to date using
>   mgcv 1.7-22.
> Upgrading to R 3.0.0 also didn't do any change.
>
> Unfortunately using method="REML" does not make any difference:
>
> ####### first with "select=FALSE"
>> fit<-gam(target
>> ~s(mgs)+s(gsd)+s(mud)+s(ssCmax),family=quasi(link=log),data=wspe1,method="REML",select=F)
>> summary(fit)
>
> Family: quasi
> Link function: log
> Formula:
> target ~ s(mgs) + s(gsd) + s(mud) + s(ssCmax)
> Parametric coefficients:
>              Estimate Std. Error t value Pr(>|t|)
> (Intercept)   -4.724      7.462  -0.633    0.527
> Approximate significance of smooth terms:
>              edf Ref.df      F p-value
> s(mgs)    3.118  3.492  0.099   0.974
> s(gsd)    6.377  7.044 15.596  <2e-16 ***
> s(mud)    8.837  8.971 18.832  <2e-16 ***
> s(ssCmax) 3.886  4.051  2.342   0.052 .
> ---
> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> R-sq.(adj) =  0.403   Deviance explained = 40.6%
> REML score =  33186  Scale est. = 8.7812e+05  n = 4511
>
>
>
>
>
> #### Then using "select=T"
>
>> fit2<-gam(target
>> ~s(mgs)+s(gsd)+s(mud)+s(ssCmax),family=quasi(link=log),data=wspe1,method="REML",select=TRUE)
>> summary(fit2)
> Family: quasi
> Link function: log
> Formula:
> target ~ s(mgs) + s(gsd) + s(mud) + s(ssCmax)
> Parametric coefficients:
>              Estimate Std. Error t value Pr(>|t|)
> (Intercept)   -6.406      5.239  -1.223    0.222
> Approximate significance of smooth terms:
>              edf Ref.df     F p-value
> s(mgs)    2.844      8 25.43  <2e-16 ***
> s(gsd)    6.071      9 14.50  <2e-16 ***
> s(mud)    6.875      8 21.79  <2e-16 ***
> s(ssCmax) 3.787      8 18.42  <2e-16 ***
> ---
> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> R-sq.(adj) =    0.4   Deviance explained = 40.1%
> REML score =  33203  Scale est. = 8.8359e+05  n = 4511
>
>
>
>
>
>
>
> I played around with other families/link functions with no success regarding
> the "select" behaviour.
>
> Well, look at the structure of my data:
> <http://r.789695.n4.nabble.com/file/n4664586/screen-capture-1.png>
>
> All possible predictor variables in principle look like this, and taken
> alone, each and every is significant according to p-value (but not all can
> at the same time).
> In theory, the target variable should be a hypersurface in 11dim space with
> lots of noise, but interaction of more than 2 vars gets costly (not to think
> of 11) and often enough (also without interaction) the solution does not
> converge at minimal step size. If it does, results are usually not as good
> as without interaction.
>
> Any comment/advice on model setup is warmly welcome here.
>
> Since I don't want to try out all possible 2047 combinations of up to eleven
> predictor variables for each target variable, I currently see no other way
> than educated manual guessing.
>
> If you know another way of (semi-)automated model tunig/reduction, I would
> very much appreciate it
>
> best regards,
> Jan
>
>
>
>
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/mgcv-how-select-significant-predictor-vars-when-using-gam-select-TRUE-using-automatic-optimization-tp4664510p4664586.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Simon Wood, Mathematical Science, University of Bath BA2 7AY UK
+44 (0)1225 386603               http://people.bath.ac.uk/sw283



More information about the R-help mailing list