[R-sig-ME] lmer and p-values
Ben Bolker
bbolker at gmail.com
Mon Mar 28 23:56:38 CEST 2011
[cc'ing back to r-sig-mixed-models]
On 03/28/2011 05:42 PM, Iker Vaquero Alba wrote:
>
> Split plot simplification is the way I was taught first, and is also
> recommended in Zuur et al 2009 for lme.
> So, do you think I should just use mcmcsamp() and get the p-values
> and the statistic values from there, without any simplification? Is this
> something that happens only for lme4 (I mean, originally I was told to
> do split plot for glm and lme)?
>
> Thank you. Iker
I know Zuur et al suggest this approach, I disagree with them. If
you'd like to respond, I'm curious about the justification for the
simplification. If it's "to eliminate unnecessary complexity so the
remaining parameters are estimated more precisely", I think it's
basically bogus. (It seems sensible on the surface, but this kind of
data-guided simplification basically doesn't work.)
I tend to fudge things a bit: I will often drop (1) RE terms that are
obviously overfitted/unidentifiable (i.e., either cause computational
errors or warnings, or are estimated with zero variance or perfect
positive or negative correlations); (2) non-significant (especially
higher-order) interaction terms -- I justify this practice on the
grounds of (1) computational stability (I worry that overfitted models
may screw up the estimation of other terms) (1 & 2) ease of
interpretation, although I know that it may lead to inflated type I
errors. (In the case of an estimated zero variance, dropping the term
shouldn't make a difference anyway.) I would try to guess in advance how
complex a model my data could support in order to minimize the need for
fudging.
If I were doing an analysis where mistakes would cost lives I would
not fudge in this way.
I note that in examples I have seen from Doug Bates (won't give exact
references here, but I think there are some in PB 2000 and in his
on-line book draft) he does do small amounts of backward simplification
of these types. The inflation of type I errors should (? guessing here
!) ) be least important for really big data sets where (a) most
everything one chooses to test will be statistically significant anyway
and (b) residual df are large.
I would indeed do what you suggest above (test hypotheses based on
elimination from the full model), for mixed *or* non-mixed models.
Happy for any corrections or disagreements from the list.
Ben Bolker
>
> --- El *lun, 28/3/11, Ben Bolker /<bbolker at gmail.com>/* escribió:
>
>
> De: Ben Bolker <bbolker at gmail.com>
> Asunto: Re: [R-sig-ME] lmer and p-values
> Para: "Iker Vaquero Alba" <karraspito at yahoo.es>
> CC: r-sig-mixed-models at r-project.org
> Fecha: lunes, 28 de marzo, 2011 23:18
>
> On 03/28/2011 01:04 PM, Iker Vaquero Alba wrote:
> >
> > Ok, I have had a look at the mcmcsamp() function. If I've got it
> > right, it generates an MCMC sample from the parameters of a model
> fitted
> > preferentially with "lmer" or similar function.
> >
> > But my doubt now is: even if I cannot trust the p-values from the
> > ANOVA comparing two different models that differ in a term, is it
> still
> > OK if I simplify the model that way until I get my Minimum Adequate
> > Model, and then I use mcmcsamp() to get a trustable p-value of the
> terms
> > I'm interested in from this MAM, or should I directly use mcmcsamp()
> > with my Maximum model and simplify it according to the p-values
> obtained
> > with it?
> >
> > Thank you. Iker
>
> Why are you simplifying the model in the first place? (That is a real
> question, with only a tinge of prescriptiveness.) Among the active
> contributors to this list and other R lists, I would say that the most
> widespread philosophy is that one should *not* do backwards elimination
> of (apparently) superfluous/non-significant terms in the model. (See
> myriad posts by Frank Harrell and others.)
>
> If you do insist on eliminating terms, then the LRT (anova()) p-values
> are no more or less reliable for the purposes of elimination than they
> are for the purposes of hypothesis testing.
>
>
>
> >
> > --- El *lun, 28/3/11, Ben Bolker /<bbolker at gmail.com
> </mc/compose?to=bbolker at gmail.com>>/* escribió:
> >
> >
> > De: Ben Bolker <bbolker at gmail.com
> </mc/compose?to=bbolker at gmail.com>>
> > Asunto: Re: [R-sig-ME] lmer and p-values
> > Para: r-sig-mixed-models at r-project.org
> </mc/compose?to=r-sig-mixed-models at r-project.org>
> > Fecha: lunes, 28 de marzo, 2011 18:27
> >
> > Iker Vaquero Alba <karraspito at ...> writes:
> >
> > >
> > >
> > > Dear list members:
> > >
> > > I am fitting a model with lmer, because I need to fit
> some nested
> > > as well as non-nested random effects in it. I am doing a
> split plot
> > > simplification, dropping terms from the model and comparing the
> > models with or
> > > without the term. When doing and ANOVA between one model and its
> > simplified
> > > version, I get, as a result, a chisquare value with 1 df (df
> from
> > the bigger
> > > model - df from the simplified one), and a p-value associated.
> > >
> > > I was just wondering if it's correct to present this
> chisquare and
> > > p values as a result of testing the effect of a certain term in
> > the model. I am
> > > a bit confused, as if I was doing this same analysis with lme, I
> > would be
> > > getting F-values and associated p-values.
> > >
> >
> > When you do anova() in this context you are doing a
> likelihood ratio
> > test, which is equivalent to doing an F test with 1 numerator
> df and
> > a very large (infinite) denominator df.
> > As Pinheiro and Bates 2000 point out, this is
> > dangerous/anticonservative
> > if your data set is small, for some value of "small".
> > Guessing an appropriate denominator df, or using mcmcsamp(), or
> > parametric
> > bootstrapping, or something, will be necessary if you want a more
> > reliable p-value.
> >
> > _______________________________________________
> > R-sig-mixed-models at r-project.org
> </mc/compose?to=R-sig-mixed-models at r-project.org>
> > </mc/compose?to=R-sig-mixed-models at r-project.org
> </mc/compose?to=R-sig-mixed-models at r-project.org>> mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> >
>
More information about the R-sig-mixed-models
mailing list