[R-sig-ME] how to know if random factors are significant?
kjbeath at kagi.com
Thu Apr 3 06:40:59 CEST 2008
On 02/04/2008, at 9:27 PM, MHH Stevens wrote:
> On Apr 2, 2008, at 3:35 AM, Rune Haubo wrote:
>> On 02/04/2008, John Maindonald <john.maindonald at anu.edu.au> wrote:
>>> There was a related question from Mariana Martinez a day or two ago.
>>> Before removing a random term that background knowledge or past
>>> experience with similar data suggests is likely, check what
>>> it makes to the p-values for the fixed effects that are of
>>> If it makes a substantial difference, caution demands that it be
>>> it in.
>>> To pretty much repeat my earlier comment:
>>> If you omit the component then you have to contemplate the
>>> 1) the component really was present but undetectable
>>> 2) the component was not present, or so small that it could be
>>> ignored, and the inference from the model that omits it is valid.
>>> If (1) has a modest probability, and it matters whether you go with
>>> (1) or (2), going with (2) leads to a very insecure inference.
>>> The p-
>>> value that comes out of the analysis is unreasonably optimistic;
>>> it is
>>> wrong and misleading.
> Can "caution" ever cause us to select the more "optimistic" model? If
> we assume that the absence of the random effect reduces the p-value
> of the fixed effect, we might ponder the situation in which there is
> a meaningful risk associated with with ignoring type II error (that
> we erroneously accept the null hypothesis). Imagine field testing the
> effects of a pesticide on non-target organisms --- does (2) result in
> a "minimum" p-value, or is the p-value, as John said, wrong and
> More generally, if a random effect has the real potential to exist
> (has a "modest probability"), but we don't see evidence for it in our
> particular data set, does it exist for us? (i.e. "If a tree
> falls ..." or worse, Heisenberg's proposition, Is the cat dead if we
> don't look?). I have typically acted as though it does not exist if I
> do not have evidence for it in MY data. However, when it does make a
> significant difference, I do lose sleep over it.
Incorrectly ignoring a random effect will have the effect of either
increasing or decreasing the significance level of fixed effects. With
the common configuration where a covariate is constant across a
cluster the effect of ignoring the covariate is to increase
significance. Provided it doesn't produce numerical problems including
a non-existent random effect probably isn't going to make a lot of
difference to the p-values.
One worry I have with checking for the existence of a random effect is
that I may have low power to detect the effect, so I don't find it,
but it may still be sufficient to inflate the type I error for a
covariate, maybe excessively so. For this reason it seems essential to
include a random effect where a fixed effect may operate in the same
way. If there are possible random effects that aren't related to
covariates, it is probably reasonable to exclude them if they don't
As an example, if I had repeated measurements on a subject over time
with a linear relationship, then I might have both a random effect for
constant and slope. Assume treatments applied to different subjects.
Now if my model is that treatments only affect the mean response and I
find that a random effect for slope doesn't improve the model, then
excluding it may improve the fit of the covariates without causing
other problems. However if I model the effect of treatment on the
slope then I should have both random effects in the model (the random
effect for constant is needed because of the slope random effect) even
if the slope random effect doesn't seem necessary, in case there
really is a random effect for slope.
>> I think this is a question of strategy. Leonel did put emphasis on
>> random effect, and he might just be interested in the size and
>> significance of the random effect rather than the fixed effects.
>> Estimating and testing the random effect seems reasonable to me in
>> this case, although confidence intervals, as you mention below also
>> provides good inference.
>> It is always possible to discuss how much non-data information to
>> include in an analysis and I believe the answer depends very much on
>> the purpose of the research. If the research question regards the
>> and "existence" of the variance of 'Site', then he might conclude
>> it is so small compared to other effects in the model/data, that it
>> has no place in the model.
>> I think the question regarding "existence" of some effect can be
>> misleading in many cases, because one can always claim that any
>> is really there, and had we observed enough data, we would be able to
>> estimate the effect reliably. Leaving too many variables in the model
>> on which there is too little information also results in bias in
>> parameter estimates, so it is a trade off. We often speak of
>> appropriate models, but the appropriateness depends on the purpose -
>> do we seek inference for a specific (set of) parameter(s), the system
>> as a whole or do we want to use it for prediction?
>>> If you do anyway want a Bayesian credible interval, which you can
>>> treat pretty much as a confidence interval, for the random
>>> check Douglas Bates' message of a few hours ago, the first of two
>>> messages with the subject "lme4::mcmcsamp + coda::HPDinterval",
>>> re the
>>> use of the function HPDInterval().
>>> John Maindonald email: john.maindonald at anu.edu.au
>>> phone : +61 2 (6125)3473 fax : +61 2(6125)5549
>>> Centre for Mathematics & Its Applications, Room 1194,
>>> John Dedman Mathematical Sciences Building (Building 27)
>>> Australian National University, Canberra ACT 0200.
>>> On 2 Apr 2008, at 4:02 AM, Leonel Arturo Lopez Toledo wrote:
>>>> Dear all:
>>>> I'm new to mixed models and I'm trying to understand the output
>>>> "lme" in the nlme
>>>> package. I hope my question is not too basic for that list-mail.
>>>> Really sorry if that
>>>> is the case.
>>>> Especially I have problems to interpret the random effect output. I
>>>> have only one
>>>> random factor which is "Site". I know the "Variance and Stdev"
>>>> indicate variation by
>>>> the random factor, but are they indicating any significance? Is
>>>> there any way to
>>>> obtain a p-value for the random effects? And in case is not
>>>> significant, how can I
>>>> remove it from the model? With "update (model,~.-)"?
>>>> The variance in first case (see below) is very low and in the
>>>> example is more
>>>> considerable, but should I consider in the model or do I remove it?
>>>> Thank you very much for your help in advance.
>>>> EXAMPLE 1
>>>> Linear mixed-effects model fit by maximum likelihood
>>>> Data: NULL
>>>> AIC BIC logLik
>>>> 277.8272 287.3283 -132.9136
>>>> Random effects:
>>>> Formula: ~1 | Sitio
>>>> (Intercept) Residual
>>>> StdDev: 0.0005098433 9.709515
>>>> EXAMPLE 2
>>>> Generalized linear mixed model fit using Laplace
>>>> Formula: y ~Canopy*Area + (1 | Sitio)
>>>> Data: tod
>>>> Family: binomial(logit link)
>>>> AIC BIC logLik deviance
>>>> 50.93 54.49 -21.46 42.93
>>>> Random effects:
>>>> Groups Name Variance Std.Dev.
>>>> Sitio (Intercept) 0.25738 0.50733
>>>> number of obs: 18, groups: Sitio, 6
>>>> Leonel Lopez
>>>> Centro de Investigaciones en Ecosistemas-UNAM
>>>> Este mensaje ha sido analizado por MailScanner
>>>> en busca de virus y otros contenidos peligrosos,
>>>> y se considera que está limpio.
>>>> For all your IT requirements visit: http://www.transtec.co.uk
>>>> R-sig-mixed-models at r-project.org mailing list
>>> R-sig-mixed-models at r-project.org mailing list
>> R-sig-mixed-models at r-project.org mailing list
> Dr. Hank Stevens, Assistant Professor
> 338 Pearson Hall
> Botany Department
> Miami University
> Oxford, OH 45056
> Office: (513) 529-4206
> Lab: (513) 529-4262
> FAX: (513) 529-4243
> "If the stars should appear one night in a thousand years, how would
> believe and adore." -Ralph Waldo Emerson, writer and philosopher
> R-sig-mixed-models at r-project.org mailing list
More information about the R-sig-mixed-models