[R-sig-ME] Model selection in GAMM

Mon Mar 18 14:29:30 CET 2013

Andrew Robinson <A.Robinson at ...> writes:

> 
> On Sun, Mar 17, 2013 at 07:44:28PM +0000, Ben Bolker wrote:
> > Mari Laine <mslain at ...> writes:
> > 
> > 
> > > I'm using GAMM (mgcv::gamm) to fit non-Gaussian models with several
> > > fixed factors and also a temporal structure. Additionally, I'm
> > > fitting non-linearity with smoothers. I have several correlated main
> > > effect candidates, and I would like to compare their GAMMs and see,
> > > which one of the models best fits the data. I would also need to
> > > tune the models via dropping unnecessary smoothers (linear ones) and
> > > non-significant fixed variables.
> >  
> > > After going through some statistical books, I'm under the impression
> > > that one should not use AIC comparison of $lme's for GAMM models -
> > > is this correct? Could someone give instruction on the model
> > > selection in GAMM or refer me to a book / some other source of
> > > information on this matter?
> > 
> >   To my knowledge there are two issues here:
> > 
> > (1) GAMM uses penalized quasi-likelihood.  According to some
> > statisticians (including Brian Ripley, who wrote the original PQL
> > code in MASS::glmmPQL, which might be what GAMM relies on -- I don't
> > remember, it might incorporate its own PQL code), one shouldn't use
> > likelihood-based approaches (including AIC) with PQL algorithms,
> > because they don't estimate a true likelihood (others say it's OK as
> > long as you make sure to scale the likelihood to get a
> > quasi-likelihood before combining it with the penalty term to get a
> > QIC).
> 
> Hi Ben,
> 
> I know that you're reporting third-party opinions, and you're not
> necessarily advocating this position yourself, but I wonder if you can
> provide some more information - even a link or a citation to them?

  Hmm.  I may be in over my head here, so *please* treat anything
I say below as false until proven true ...

> 
> I'm a bit confused about how scaling the maximized penalized
> likelihood (MPL) can deliver something that can be treated as though
> it were a maximized likelihood.
> 
> To my way of thinking, a point of maximizing the penalized likelihood
> is that you get an estimate with better statistical properties than
> the MLE.  

  It's been a long time since I looked at this, and I don't understand
it well enough, but it was my understanding that the quantity maximized
by PQL is an *approximation* to the marginal likelihood used in all
other GLMM applications (i.e. the conditional likelihood of the data
given fixed values of the random effects, weighted by the probability
of those RE values given the distribution of the REs, integrated over
all possible values of the REs) -- one that is less accurate than
Laplace approximation and Gauss-Hermite quadrature, but an approximation
to the same quantity nevertheless.

  So I thought the idea was not (in this case) to get an estimate
with better properties than the MLE, but to get an estimate of the
marginal likelihood at all ...

> It is overwhelmingly likely that this MPL estimate will be different
> from the MLE.  It's hard for me to imagine a way that the PL function
> can be scaled so that the MPLE can be treated as though it were an
> MLE.  If the MPLE won't be the same as the MLE, then the L can't be
> maximized at the MPLE.  Further, the PL function will be a different
> shape at the optimum (I intuit) than the L would be at its optimum.  

    For one quick practical counterexample, glmmPQL generally gives
_similar_ answers to glmer/glmmADMB etc. (i.e. methods based on
Laplace or better approximations) for large data sets and those
where the counts per individual sample are large ...
> 
> So, is there theory to suggest that the properties of the MLE that are
> relied upon by the various measures of information are retained by the
> MPLE?  And if so, even then, wouldn't those properties depend on the
> nature of the penalization?

   I would love to know the answer.  I hope someone more knowledgeable,
or with much more time to spend figuring this out, comes forward.

  Ben

> 
> > (2) As I recall it's a little tricky to figure out which components
> > of a GAMM call contain which kinds of information about the fit.
> > In particular it's not clear whether the likelihood/AIC reported
> > in the lme component of the fit really reflect an appropriate
> > (quasi)likelihood/IC of the full model; I believe there's a lot
> > of detail on this in the ?gamm help page: in particular,
> > 
> > > For example,
> > > unlike ‘glmmPQL’ from ‘MASS’ it will return the complete ‘lme’
> > > object from the working model at convergence of the PQL iteration,
> > > including the `log likelihood', even though this is not the
> > > likelihood of the fitted GAMM.
> > 
> > which suggests you shouldn't use that log-likelihood ...
> > 
> >   I would look for further information in Simon Wood's excellent book on 
> > generalized additive models.
> > 
> > _______________________________________________
> > R-sig-mixed-models <at> r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>