[R-sig-ME] how to know if random factors are significant?

Wed Apr 2 09:35:59 CEST 2008

On 02/04/2008, John Maindonald <john.maindonald at anu.edu.au> wrote:
> There was a related question from Mariana Martinez a day or two ago.
>  Before removing a random term that background knowledge or past
>  experience with similar data suggests is likely, check what difference
>  it makes to the p-values for the fixed  effects that are of interest.
>  If it makes a substantial difference, caution demands that it be left
>  it in.
>
>  To pretty much repeat my earlier comment:
>  If you omit the component then you have to contemplate the alternatives:
>  1) the component really was present but undetectable
>  2) the component was not present, or so small that it could be
>  ignored, and the inference from the model that omits it is valid.
>
>  If (1) has a modest probability, and it matters whether you go with
>  (1) or (2), going with (2) leads to a very insecure inference. The p-
>  value that comes out of the analysis is unreasonably optimistic; it is
>  wrong and misleading.

I think this is a question of strategy. Leonel did put emphasis on the
random effect, and he might just be interested in the size and
significance of the random effect rather than the fixed effects.
Estimating and testing the random effect seems reasonable to me in
this case, although confidence intervals, as you mention below also
provides good inference.

It is always possible to discuss how much non-data information to
include in an analysis and I believe the answer depends very much on
the purpose of the research. If the research question regards the size
and "existence" of the variance of 'Site', then he might conclude that
it is so small compared to other effects in the model/data, that it
has no place in the model.

I think the question regarding "existence" of some effect can be
misleading in many cases, because one can always claim that any effect
is really there, and had we observed enough data, we would be able to
estimate the effect reliably. Leaving too many variables in the model
on which there is too little information also results in bias in
parameter estimates, so it is a trade off. We often speak of
appropriate models, but the appropriateness depends on the purpose -
do we seek inference for a specific (set of) parameter(s), the system
as a whole or do we want to use it for prediction?

/Rune
>
>  If you do anyway want a Bayesian credible interval, which you can
>  treat pretty much as a confidence interval, for the random component,
>  check Douglas Bates' message of a few hours ago, the first of two
>  messages with the subject "lme4::mcmcsamp + coda::HPDinterval", re the
>  use of the function HPDInterval().
>
>
>  John Maindonald             email: john.maindonald at anu.edu.au
>  phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
>  Centre for Mathematics & Its Applications, Room 1194,
>  John Dedman Mathematical Sciences Building (Building 27)
>  Australian National University, Canberra ACT 0200.
>
>
>
>  On 2 Apr 2008, at 4:02 AM, Leonel Arturo Lopez Toledo wrote:
>
>  > Dear all:
>  > I'm new to mixed models and I'm trying to understand the output from
>  > "lme" in the nlme
>  > package. I hope my question is not too basic for that list-mail.
>  > Really sorry if that
>  > is the case.
>  > Especially I have problems to interpret the random effect output. I
>  > have only one
>  > random factor which is "Site". I know the "Variance and Stdev"
>  > indicate variation by
>  > the random factor, but are they indicating any significance? Is
>  > there any way to
>  > obtain a p-value for the random effects? And in case is not
>  > significant, how can I
>  > remove it from the model? With "update (model,~.-)"?
>  >
>  > The variance in first case (see below) is very low and in the second
>  > example is more
>  > considerable, but should I consider in the model or do I remove it?
>  >
>  > Thank you very much for your help in advance.
>  >
>  > EXAMPLE 1
>  > Linear mixed-effects model fit by maximum likelihood
>  > Data: NULL
>  >       AIC      BIC    logLik
>  >  277.8272 287.3283 -132.9136
>  >
>  > Random effects:
>  > Formula: ~1 | Sitio
>  >         (Intercept) Residual
>  > StdDev: 0.0005098433 9.709515
>  >
>  > EXAMPLE 2
>  > Generalized linear mixed model fit using Laplace
>  > Formula: y ~Canopy*Area + (1 | Sitio)
>  >   Data: tod
>  > Family: binomial(logit link)
>  >   AIC   BIC logLik deviance
>  > 50.93 54.49 -21.46    42.93
>  >
>  > Random effects:
>  > Groups Name        Variance Std.Dev.
>  > Sitio  (Intercept) 0.25738  0.50733
>  > number of obs: 18, groups: Sitio, 6
>  >
>  >
>  > Leonel Lopez
>  > Centro de Investigaciones en Ecosistemas-UNAM
>  > MEXICO
>  >
>  >
>  >
>  >
>  > --
>  > Este mensaje ha sido analizado por MailScanner
>  > en busca de virus y otros contenidos peligrosos,
>  > y se considera que está limpio.
>  > For all your IT requirements visit: http://www.transtec.co.uk
>  >
>  > _______________________________________________
>  > R-sig-mixed-models at r-project.org mailing list
>  > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
>  _______________________________________________
>  R-sig-mixed-models at r-project.org mailing list
>  https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>