[R-sig-ME] how to know if random factors are significant?

John Maindonald John.Maindonald at anu.edu.au
Thu Apr 3 04:40:55 CEST 2008


An analogy with the Copenhagen interpretation (not by any means the  
only interpretation on offer) of quantum mechanics seems to me  
strained.  In that arena, there's a lot to be said for the "Shut up  
and calculate" view that is favored by at least some physicists, not  
advice I'd want to give to mixed level modelers!  Rather, the issue  
here has to do with a too cavalier use of Occam's razor, when Fisher's  
"Make your hypotheses complex" is more pertinent.

Debate over the use of results from twin studies to partition effects  
on measured IQ into environmental and genetic components illustrates  
the point.  The variance components are relevant only in the  
populations of parents who adopted one or other twin.  More to the  
present point, the Flynn effect by which there've been huge IQ  
increases between one generation and the next requires the invocation  
of some mixture of environmental and genetic effects that are outside  
the ken of both the twin studies data and the models used to analyze  
that data.  In biology, do not expect anything to be simple.  As I  
understand it, there've been a variety of attempts to explain the  
Flynn effect, but no clear consensus.

The analyst ought to worry about implications of the with/without  
disputed random effect for power (or effective sample size, or ...) as  
well as for the p-value or CI limits.  The analyst who omits the  
disputed random effect has to worry both that the p-value might be  
unreasonably optimistic and the power curve unreasonably optimistic.

John Maindonald             email: john.maindonald at anu.edu.au
phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
Centre for Mathematics & Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.


On 2 Apr 2008, at 9:27 PM, MHH Stevens wrote:

>
> On Apr 2, 2008, at 3:35 AM, Rune Haubo wrote:
>> On 02/04/2008, John Maindonald <john.maindonald at anu.edu.au> wrote:
>>> There was a related question from Mariana Martinez a day or two ago.
>>> Before removing a random term that background knowledge or past
>>> experience with similar data suggests is likely, check what  
>>> difference
>>> it makes to the p-values for the fixed  effects that are of  
>>> interest.
>>> If it makes a substantial difference, caution demands that it be  
>>> left
>>> it in.
>>>
>>> To pretty much repeat my earlier comment:
>>> If you omit the component then you have to contemplate the  
>>> alternatives:
>>> 1) the component really was present but undetectable
>>> 2) the component was not present, or so small that it could be
>>> ignored, and the inference from the model that omits it is valid.
>>>
>>> If (1) has a modest probability, and it matters whether you go with
>>> (1) or (2), going with (2) leads to a very insecure inference. The  
>>> p-
>>> value that comes out of the analysis is unreasonably optimistic;  
>>> it is
>>> wrong and misleading.
> Can "caution" ever cause us to select the more "optimistic" model?  
> If we assume that the absence of the random effect reduces the p- 
> value of the fixed effect, we might ponder the situation in which  
> there is a meaningful risk associated with with ignoring type II  
> error (that we erroneously accept the null hypothesis). Imagine  
> field testing the effects of a pesticide on non-target organisms ---  
> does (2) result in a "minimum" p-value, or is the p-value, as John  
> said, wrong and misleading?
>
> More generally, if a random effect has the real potential to exist  
> (has a "modest probability"), but we don't see evidence for it in  
> our particular data set, does it exist for us? (i.e. "If a tree  
> falls ..." or worse, Heisenberg's proposition, Is the cat dead if we  
> don't look?). I have typically acted as though it does not exist if  
> I do not have evidence for it in MY data. However, when it does make  
> a significant difference, I do lose sleep over it.
>
> -Hank
>>
>> I think this is a question of strategy. Leonel did put emphasis on  
>> the
>> random effect, and he might just be interested in the size and
>> significance of the random effect rather than the fixed effects.
>> Estimating and testing the random effect seems reasonable to me in
>> this case, although confidence intervals, as you mention below also
>> provides good inference.
>>
>> It is always possible to discuss how much non-data information to
>> include in an analysis and I believe the answer depends very much on
>> the purpose of the research. If the research question regards the  
>> size
>> and "existence" of the variance of 'Site', then he might conclude  
>> that
>> it is so small compared to other effects in the model/data, that it
>> has no place in the model.
>>
>> I think the question regarding "existence" of some effect can be
>> misleading in many cases, because one can always claim that any  
>> effect
>> is really there, and had we observed enough data, we would be able to
>> estimate the effect reliably. Leaving too many variables in the model
>> on which there is too little information also results in bias in
>> parameter estimates, so it is a trade off. We often speak of
>> appropriate models, but the appropriateness depends on the purpose -
>> do we seek inference for a specific (set of) parameter(s), the system
>> as a whole or do we want to use it for prediction?
>>
>> /Rune
>>>
>>> If you do anyway want a Bayesian credible interval, which you can
>>> treat pretty much as a confidence interval, for the random  
>>> component,
>>> check Douglas Bates' message of a few hours ago, the first of two
>>> messages with the subject "lme4::mcmcsamp + coda::HPDinterval", re  
>>> the
>>> use of the function HPDInterval().
>>>
>>>
>>> John Maindonald             email: john.maindonald at anu.edu.au
>>> phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
>>> Centre for Mathematics & Its Applications, Room 1194,
>>> John Dedman Mathematical Sciences Building (Building 27)
>>> Australian National University, Canberra ACT 0200.
>>>
>>>
>>>
>>> On 2 Apr 2008, at 4:02 AM, Leonel Arturo Lopez Toledo wrote:
>>>
>>>> Dear all:
>>>> I'm new to mixed models and I'm trying to understand the output  
>>>> from
>>>> "lme" in the nlme
>>>> package. I hope my question is not too basic for that list-mail.
>>>> Really sorry if that
>>>> is the case.
>>>> Especially I have problems to interpret the random effect output. I
>>>> have only one
>>>> random factor which is "Site". I know the "Variance and Stdev"
>>>> indicate variation by
>>>> the random factor, but are they indicating any significance? Is
>>>> there any way to
>>>> obtain a p-value for the random effects? And in case is not
>>>> significant, how can I
>>>> remove it from the model? With "update (model,~.-)"?
>>>>
>>>> The variance in first case (see below) is very low and in the  
>>>> second
>>>> example is more
>>>> considerable, but should I consider in the model or do I remove it?
>>>>
>>>> Thank you very much for your help in advance.
>>>>
>>>> EXAMPLE 1
>>>> Linear mixed-effects model fit by maximum likelihood
>>>> Data: NULL
>>>>      AIC      BIC    logLik
>>>> 277.8272 287.3283 -132.9136
>>>>
>>>> Random effects:
>>>> Formula: ~1 | Sitio
>>>>        (Intercept) Residual
>>>> StdDev: 0.0005098433 9.709515
>>>>
>>>> EXAMPLE 2
>>>> Generalized linear mixed model fit using Laplace
>>>> Formula: y ~Canopy*Area + (1 | Sitio)
>>>>  Data: tod
>>>> Family: binomial(logit link)
>>>>  AIC   BIC logLik deviance
>>>> 50.93 54.49 -21.46    42.93
>>>>
>>>> Random effects:
>>>> Groups Name        Variance Std.Dev.
>>>> Sitio  (Intercept) 0.25738  0.50733
>>>> number of obs: 18, groups: Sitio, 6
>>>>
>>>>
>>>> Leonel Lopez
>>>> Centro de Investigaciones en Ecosistemas-UNAM
>>>> MEXICO
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Este mensaje ha sido analizado por MailScanner
>>>> en busca de virus y otros contenidos peligrosos,
>>>> y se considera que está limpio.
>>>> For all your IT requirements visit: http://www.transtec.co.uk
>>>>
>>>> _______________________________________________
>>>> R-sig-mixed-models at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>
>>> _______________________________________________
>>> R-sig-mixed-models at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>
>>
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
> Dr. Hank Stevens, Assistant Professor
> 338 Pearson Hall
> Botany Department
> Miami University
> Oxford, OH 45056
>
> Office: (513) 529-4206
> Lab: (513) 529-4262
> FAX: (513) 529-4243
> http://www.cas.muohio.edu/~stevenmh/
> http://www.cas.muohio.edu/ecology
> http://www.muohio.edu/botany/
>
> "If the stars should appear one night in a thousand years, how would  
> men
> believe and adore." -Ralph Waldo Emerson, writer and philosopher  
> (1803-1882)
>
>
>
>
>
>




More information about the R-sig-mixed-models mailing list