[R-sig-ME] lmer and p-values

Mon Mar 28 23:56:38 CEST 2011

  [cc'ing back to r-sig-mixed-models]

On 03/28/2011 05:42 PM, Iker Vaquero Alba wrote:
> 
>    Split plot simplification is the way I was taught first, and is also
> recommended in Zuur et al 2009 for lme.
>    So, do you think I should just use mcmcsamp() and get the p-values
> and the statistic values from there, without any simplification? Is this
> something that happens only for lme4 (I mean, originally I was told to
> do split plot for glm and lme)?
> 
>    Thank you.  Iker

   I know Zuur et al suggest this approach, I disagree with them. If
you'd like to respond, I'm curious about the justification for the
simplification.  If it's "to eliminate unnecessary complexity so the
remaining parameters are estimated more precisely", I think it's
basically bogus.  (It seems sensible on the surface, but this kind of
data-guided simplification basically doesn't work.)

   I tend to fudge things a bit: I will often drop (1) RE terms that are
obviously overfitted/unidentifiable (i.e., either cause computational
errors or warnings, or are estimated with zero variance or perfect
positive or negative correlations); (2) non-significant (especially
higher-order) interaction terms -- I justify this practice on the
grounds of (1) computational stability (I worry that overfitted models
may screw up the estimation of other terms) (1 & 2) ease of
interpretation, although I know that it may lead to inflated type I
errors.  (In the case of an estimated zero variance, dropping the term
shouldn't make a difference anyway.) I would try to guess in advance how
complex a model my data could support in order to minimize the need for
fudging.
  If I were doing an analysis where mistakes would cost lives I would
not fudge in this way.

  I note that in examples I have seen from Doug Bates (won't give exact
references here, but I think there are some in PB 2000 and in his
on-line book draft) he does do small amounts of backward simplification
of these types.  The inflation of type I errors should (? guessing here
!) ) be least important for really big data sets where (a) most
everything one chooses to test will be statistically significant anyway
and (b) residual df are large.

  I would indeed do what you suggest above (test hypotheses based on
elimination from the full model), for mixed *or* non-mixed models.

  Happy for any corrections or disagreements from the list.

  Ben Bolker

> 
> --- El *lun, 28/3/11, Ben Bolker /<bbolker at gmail.com>/* escribió:
> 
> 
>     De: Ben Bolker <bbolker at gmail.com>
>     Asunto: Re: [R-sig-ME] lmer and p-values
>     Para: "Iker Vaquero Alba" <karraspito at yahoo.es>
>     CC: r-sig-mixed-models at r-project.org
>     Fecha: lunes, 28 de marzo, 2011 23:18
> 
>     On 03/28/2011 01:04 PM, Iker Vaquero Alba wrote:
>     >
>     >    Ok, I have had a look at the mcmcsamp() function. If I've got it
>     > right, it generates an MCMC sample from the parameters of a model
>     fitted
>     > preferentially with "lmer" or similar function.
>     >
>     >    But my doubt now is: even if I cannot trust the p-values from the
>     > ANOVA comparing two different models that differ in a term, is it
>     still
>     > OK if I simplify the model that way until I get my Minimum Adequate
>     > Model, and then I use mcmcsamp() to get a trustable p-value of the
>     terms
>     > I'm interested in from this MAM, or should I directly use mcmcsamp()
>     > with my Maximum model and simplify it according to the p-values
>     obtained
>     > with it?
>     >
>     >    Thank you. Iker
> 
>       Why are you simplifying the model in the first place?  (That is a real
>     question, with only a tinge of prescriptiveness.) Among the active
>     contributors to this list and other R lists, I would say that the most
>     widespread philosophy is that one should *not* do backwards elimination
>     of (apparently) superfluous/non-significant terms in the model.  (See
>     myriad posts by Frank Harrell and others.)
> 
>       If you do insist on eliminating terms, then the LRT (anova()) p-values
>     are no more or less reliable for the purposes of elimination than they
>     are for the purposes of hypothesis testing.
> 
> 
> 
>     >
>     > --- El *lun, 28/3/11, Ben Bolker /<bbolker at gmail.com
>     </mc/compose?to=bbolker at gmail.com>>/* escribió:
>     >
>     >
>     >     De: Ben Bolker <bbolker at gmail.com
>     </mc/compose?to=bbolker at gmail.com>>
>     >     Asunto: Re: [R-sig-ME] lmer and p-values
>     >     Para: r-sig-mixed-models at r-project.org
>     </mc/compose?to=r-sig-mixed-models at r-project.org>
>     >     Fecha: lunes, 28 de marzo, 2011 18:27
>     >
>     >     Iker Vaquero Alba <karraspito at ...> writes:
>     >
>     >     >
>     >     >
>     >     >    Dear list members:
>     >     >
>     >     >    I am fitting a model with lmer, because I need to fit
>     some nested
>     >     > as well as non-nested random effects in it. I am doing a
>     split plot
>     >     > simplification, dropping terms from the model and comparing the
>     >     models with or
>     >     > without the term. When doing and ANOVA between one model and its
>     >     simplified
>     >     > version, I get, as a result, a chisquare value with 1 df (df
>     from
>     >     the bigger
>     >     > model - df from the simplified one), and a p-value associated.
>     >     >
>     >     >    I was just wondering if it's correct to present this
>     chisquare and
>     >     > p values as a result of testing the effect of a certain term in
>     >     the model. I am
>     >     > a bit confused, as if I was doing this same analysis with lme, I
>     >     would be
>     >     > getting F-values and associated p-values.
>     >     >
>     >
>     >       When you do anova() in this context you are doing a
>     likelihood ratio
>     >     test, which is equivalent to doing an F test with 1 numerator
>     df and
>     >     a very large (infinite) denominator df.
>     >       As Pinheiro and Bates 2000 point out, this is
>     >     dangerous/anticonservative
>     >     if your data set is small, for some value of "small".
>     >        Guessing an appropriate denominator df, or using mcmcsamp(), or
>     >     parametric
>     >     bootstrapping, or something, will be necessary if you want a more
>     >     reliable p-value.
>     >
>     >     _______________________________________________
>     >     R-sig-mixed-models at r-project.org
>     </mc/compose?to=R-sig-mixed-models at r-project.org>
>     >     </mc/compose?to=R-sig-mixed-models at r-project.org
>     </mc/compose?to=R-sig-mixed-models at r-project.org>> mailing list
>     >     https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>     >
>