[R-sig-eco] gam variable selection

Marco Helbich marco.helbich at gmx.at
Tue Sep 27 14:40:37 CEST 2011


thank you for clarifying.
so I can remove them all at once.

best
marco

Am 27.09.2011 13:50, schrieb Gavin Simpson:
> On Tue, 2011-09-27 at 13:42 +0200, Marco Helbich wrote:
>> Gavin,
>>
>> thank you for your reply, I appreciate it!
>>
>> After consulting the proposed paper, I have tried your suggestion
>> setting "select = T", which results again in another question:
>>
>> If the p-value is "NA" does this mean that the smoothing term is droped
>> (or shrank to zero)? Independent of its high edf, is this predictor
>> (e.g. s(x1)) not relevant to explain y?
>
> Those NA terms are ones that have effectively been penalised out of the
> model - the EDF are effectively zero for these terms and they explain no
> variance in the response. These predictors s(x1) and s(x4) appear to
> have no relationships with y.
>
> You should also check out if there is concurvity - the multi
> collinearity problem but for additive models. There is a function in
> mgcv to see if this is a problem or not.
>
> HTH
>
> G
>
>>
>> E.g.:
>>                       edf    Ref.df      F p-value
>> s(x1)   7.521e-09 1.402e-08  0.000      NA
>> s(x2)    5.408e+00 6.448e+00  3.049 0.00462 **
>> s(x3)    6.287e-09 1.217e-08  0.000      NA
>> s(x4)    2.152e+00 2.754e+00  5.037 0.00248 **
>>
>> Best
>> Marco
>>
>>
>> Am 27.09.2011 11:40, schrieb Gavin Simpson:
>>> On Tue, 2011-09-27 at 08:54 +0200, Marco Helbich wrote:
>>>> Dear list,
>>>>
>>>> I am studying the influence of several environmental factors (numeric&
>>>> dummies) on species densities (= numeric) using the gam()
>>>> function with a gaussian link function in the mgcv package. As stated in
>>>> Wood (2006) there is no variable selection algorithm.
>>>>
>>>> Is it an appropriate (iterative) approach to drop the predictor being
>>>> least significant (eg. p>   0.05), refit the model, compare the GCV/AIC
>>>> score and so forth. Should I first focus on the smoothing functions or
>>>> fixed effects? Or is such a distinction not important at all?
>>>>
>>>> Perhaps someone has more experience with GAMs and can give me a helping
>>>> hand? Thanks in advance!
>>>
>>> You could do that, but I would be sceptical of the results.
>>>
>>> Marra and Wood (2011, Computational Statistics and Data Analysis 55;
>>> 2372-2387) compare various approaches for feature selection in GAMs.
>>> IIRC, they concluded that an additional penalty term in the smoothness
>>> selection procedure gave the best results. This can be activated in
>>> mgcv::gam() by using the `select = TRUE` argument/setting.
>>>
>>> HTH
>>>
>>> G
>>>
>>>> Best
>>>> Marco
>>>
>>
>



More information about the R-sig-ecology mailing list