[R] GLM model vs. GAM model
Simon Wood
simon at stats.gla.ac.uk
Thu Oct 28 13:40:56 CEST 2004
For GAMs fitted using mgcv::gam, I think that there's an argument that the
most self consistent comparison would be based on the GCV/UBRE score for
the models, since this has usually been used to select the degree of
smoothness of the GAM model (you can fit the GLM using gam(), so this is
available for both models).
As you (and the help files) point out, the p-values produced by anova.gam
are only approximate, when the models are fitted using penalties on the
likelihood. There are two issues here, I think.
i) The calculations are conditional on the smoothing parameters, which
have usually been estimated.
ii) Given the smoothing parameters, the GAMs are estimated by penalized
likelihood maximization, not MLE.
(ii)is an issue when comparing two or more models using anova.gam, since
the calculations use the MLE theory based distributional results, but with
the model degrees of freedom set to the effective degrees of freedom of
the model. This is not as outrageous as it sounds: the GAM can always be
re-parameterized using the eigen-vectors of its penalty matrix. This
parameterization is orthogonal, and has a diagonal penalty matrix with a
elements that increase rather smoothly and steeply. This tends to mean
that, for any given smoothing parameter, some model parameters are
effectively zeroed while some are effectively un-penalized. If all parameters
were in one of these two categories then using distributional theory for
un-penalized GLMs, with model degrees of freedom set to the EDF of the GAM
would be perfectly legitimate. Of course in reality there are almost
always a number of parameters subject to intermediate amounts of
penalization. The approach used in anova.gam is effectively saying
that these are equivalent to a smaller number of unpenalized parameters,
which can only be an approximation!
Note that single argument calls to anova.gam do not use the MLE based
results.
Interestingly, taking a mixed model approach (e.g. mgcv::gamm, for simple
version) doesn't make things much better: in that case you *can* get
unpenalized MLE's, but testing for presence or absence of smooth terms
involves LRT's at the boundary of the parameter space, while other tests
are often conditional on the parameters of the random effect
distributions...
> I have a question about how to compare a GLM with a GAM model using anova
> function.
>
> A GLM is performed for example:
>
> model1 <-glm(formula = exitus ~ age+gender+diabetes, family = "binomial",
> na.action = na.exclude)
>
> A second nested model could be:
>
> model2 <-glm(formula = exitus ~ age+gender, family = "binomial", na.action =
> na.exclude)
>
>
>
> To compare these two GLM models the instruction is:
>
> anova(model1,model2, test="F")
>
>
>
> Similarly for GAM models
>
> model3 <-gam(formula = exitus ~ s(age)+gender, family = "binomial",
> na.action = na.exclude)
>
>
>
> "R" allows to compare these two models (GLM vs. GAM)
>
> anova(model2,model3, test="F")
>
> This instruction returns a p-value with no error or warning, but this test
> is based on maximum likelihood, and GAM models are not fitted with maximum
> likelihood criteria, thus I think this p-value is not correct.
>
> Please, I really appreciate any information about how to compare a GLM with
> a GAM model.
best,
Simon
_____________________________________________________________________
> Simon Wood simon at stats.gla.ac.uk www.stats.gla.ac.uk/~simon/
>> Department of Statistics, University of Glasgow, Glasgow, G12 8QQ
>>> Direct telephone: (0)141 330 4530 Fax: (0)141 330 4814
More information about the R-help
mailing list