[R-sig-ME] lmer and p-values (variable selection)

Tue Mar 29 00:15:55 CEST 2011

A slightly more accommodating position is that some selection 
may be acceptable if it makes little difference to the magnitudes of
parameter estimates and to the interpretations that can be placed
upon them.  [Since writing this, I notice that Ben has now posted a
message that makes broadly similar follow-up points.]

The usual interpretations of p-values assume, among other things, 
a known model.  This assumption is invalidated if there has been
some element of backward elimination or other element of variable
selection.  Following variable selection, the p-value is no longer, 
strictly, a valid p-value.

Elimination of a term with a p-value greater than say 0.15 or 0.2 is
however likely to make little differences to estimates of other terms
in the model.  Thus, it may be a reasonable way to proceed.  For
this purpose, an anti-conservative (smaller than it should be)  
p-value will usually serve the purpose.

Nowadays it is of course relatively easy to do a simulation that will 
check the effect of a particular variable elimination/selection strategy.  
If there is some use of variable elimination/selection, and anything of 
consequence hangs on the results, this should surely be standard 
practice. 

John Maindonald             email: john.maindonald at anu.edu.au
phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
Centre for Mathematics & Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.
http://www.maths.anu.edu.au/~johnm

On 29/03/2011, at 8:18 AM, Ben Bolker wrote:

> On 03/28/2011 01:04 PM, Iker Vaquero Alba wrote:
>> 
>>   Ok, I have had a look at the mcmcsamp() function. If I've got it
>> right, it generates an MCMC sample from the parameters of a model fitted
>> preferentially with "lmer" or similar function.
>> 
>>   But my doubt now is: even if I cannot trust the p-values from the
>> ANOVA comparing two different models that differ in a term, is it still
>> OK if I simplify the model that way until I get my Minimum Adequate
>> Model, and then I use mcmcsamp() to get a trustable p-value of the terms
>> I'm interested in from this MAM, or should I directly use mcmcsamp()
>> with my Maximum model and simplify it according to the p-values obtained
>> with it?
>> 
>>   Thank you. Iker
> 
>  Why are you simplifying the model in the first place?  (That is a real
> question, with only a tinge of prescriptiveness.) Among the active
> contributors to this list and other R lists, I would say that the most
> widespread philosophy is that one should *not* do backwards elimination
> of (apparently) superfluous/non-significant terms in the model.  (See
> myriad posts by Frank Harrell and others.)
> 
>  If you do insist on eliminating terms, then the LRT (anova()) p-values
> are no more or less reliable for the purposes of elimination than they
> are for the purposes of hypothesis testing.
> 
> 
> 
>> 
>> --- El *lun, 28/3/11, Ben Bolker /<bbolker at gmail.com>/* escribió:
>> 
>> 
>>    De: Ben Bolker <bbolker at gmail.com>
>>    Asunto: Re: [R-sig-ME] lmer and p-values
>>    Para: r-sig-mixed-models at r-project.org
>>    Fecha: lunes, 28 de marzo, 2011 18:27
>> 
>>    Iker Vaquero Alba <karraspito at ...> writes:
>> 
>>> 
>>> 
>>>   Dear list members:
>>> 
>>>   I am fitting a model with lmer, because I need to fit some nested
>>> as well as non-nested random effects in it. I am doing a split plot
>>> simplification, dropping terms from the model and comparing the
>>    models with or
>>> without the term. When doing and ANOVA between one model and its
>>    simplified
>>> version, I get, as a result, a chisquare value with 1 df (df from
>>    the bigger
>>> model - df from the simplified one), and a p-value associated.
>>> 
>>>   I was just wondering if it's correct to present this chisquare and
>>> p values as a result of testing the effect of a certain term in
>>    the model. I am
>>> a bit confused, as if I was doing this same analysis with lme, I
>>    would be
>>> getting F-values and associated p-values.
>>> 
>> 
>>      When you do anova() in this context you are doing a likelihood ratio
>>    test, which is equivalent to doing an F test with 1 numerator df and
>>    a very large (infinite) denominator df. 
>>      As Pinheiro and Bates 2000 point out, this is
>>    dangerous/anticonservative
>>    if your data set is small, for some value of "small".
>>       Guessing an appropriate denominator df, or using mcmcsamp(), or
>>    parametric
>>    bootstrapping, or something, will be necessary if you want a more
>>    reliable p-value.
>> 
>>    _______________________________________________
>>    R-sig-mixed-models at r-project.org
>>    </mc/compose?to=R-sig-mixed-models at r-project.org> mailing list
>>    https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>> 
> 
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models