[R] How to test for significance of random effects?

Tue May 16 06:32:25 CEST 2006

Hi Spencer, Dan,

I think that it depends on the role that the random effects are
playing.  In models that I have fit, random effects can play one or
more of three roles:

1) to reflect the experimental design.  These random effects are
   sacrosanct, as far as I am concerned, and should be included in the
   model whether significant or not.  Therefore, such random effects
   are not tested; they are estimated and reported.

2) to improve the match between models and assumptions.  For example,
   random intercepts can be augmented by random slopes to ensure that
   the diagnostics that reflect the assumptions of normality and
   homoskedasticity are satisfied.  (I once had to use random
   intercepts, slopes, and quadratic terms; I'm sure there are others
   who have had to do worse!).  Testing is rarely of interest in this
   case because the role of the random effects is to extend the model
   so that its assumptions are satisfied.

   However if interest is in a simple model (for some reason) then it
   might be reasonable to test whether such an innovation
   significantly improves the fit of the model. For example, I might
   use a whole-model test to assess whether I need a within-subject
   correlation model if the ACF plot is borderline.  It's important to
   recall, though, that the test outcomes are predicated on the model
   assumptions, so interpreting the test results when the assumptions
   are in doubt is a risky business.

3) to act as containers for estimating variance components of
   interest.  As for 1), there is really no need to test such random
   effects, because our interest is in estimating the values that they
   represent.

I would be interested to hear of other uses to which random effects
have been put :)

I suggest that the original poster might in general consider computing
the intra-class correlation to show that within-group statistical
dependence is negligible.  However, in the case of GLMMs I am not at
all sure that it retains any meaning.  Computer, beware!

Cheers

Andrew

On Mon, May 15, 2006 at 08:46:39PM -0700, Spencer Graves wrote:

> 	  I don't know if one should include an apparently
> insignificant random effect in later analysis or not.  In the past,
> I haven't.  As far as I know, the "best" thing to use would be
> "Bayesian model averaging", beginning with some prior over the class
> of all plausible models (with and without the random effect), and
> then average predictions, etc., over the posterior.
>
> 	  For more information, you could Google for "Bayesian model
> averaging" and try RSiteSearch("Bayesian model averaging").  I'm not
> aware of any "BMA" software in R for mixed models, but I suspect it
> will be only a matter of time before "BMA" replaces "step" and
> "stepAIC" for stepwise regression-type applications.  Incorporating
> mixed models into this framework will be harder, but I know of no
> theoretical obstacles.
>
> 	  With luck, others will enlighten us both further on this.
> 
> 	  Best Wishes,
> 	  Spencer Graves
> 
> Dan Bebber wrote:
> > I may be out of my statistical depth here, but isn't it the case that if one 
> > has an experimental design with random effects, one has to include the 
> > random effects, even if they appear to be non-significant?
> > AFAIK there are two reasons: one is the possibility of 'restriction errors' 
> > that arise by unintentional differences in treatments among groups, so 
> > making analysis of among-group variance problematic; the other is that 
> > allocations of fixed effects to samples is no longer random and therefore 
> > the assumption of random errors is broken.
> > Real statisticians may disagree with this, however.
> > 
> > Dan Bebber
> > 
> > Department of Plant Sciences
> > University of Oxford
> > 
> > Message: 12
> > Date: Sun, 07 May 2006 14:25:44 -0700
> > From: Spencer Graves <spencer.graves at pdf.com>
> > Subject: Re: [R] How to test for significance of random effects?
> > To: Jon Olav Vik <jonovik at start.no>
> > Cc: r-help at stat.math.ethz.ch
> > Message-ID: <445E65D8.5010504 at pdf.com>
> > Content-Type: text/plain; charset=ISO-8859-1; format=flowed
> > 
> >   1.  Ignoring the "complication of logistic regression", the
> > "anova(lme1,lm1)" provides the answer you seek.  See sect. 2.4 in
> > Pinheiro and Bates for more detail on the approximations involved and
> > how that answer can be refined using monte carlo.
> > 
> >   2.  With logistic regression, you want to do essentially the same
> > thing using glm and lmer (in package 'lme4'), except that many of the
> > required functions are not yet part of 'lme4'.  Consider the following
> > example:
> > 
> > library(lme4)
> > 
> > library(mlmRev)
> > (mlmR <- vignette("MlmSoftRev"))
> > #edit(mlmR) # with Rgui
> > #Stangle(mlmR$file) # with ESS
> > #   -> then open file MlmSoftRev.R
> > 
> > fitBin <- lmer(use ~ urban+age+livch+(1|district),
> >                 data=Contraception, family=binomial)
> > fitBin0 <- glm(use ~ urban+age+livch,
> >                 data=Contraception, family=binomial)
> > 
> > 2*pchisq(2*as.numeric(logLik(fitBin)-
> > logLik(fitBin0)), 2, lower.tail=FALSE)
> > 
> >   Note however that this p-value computation is known to be only an
> > approximation;  see RSiteSearch("lmer p-values") for other perspectives.
> >   More accurate p-values can be obtained using Markov Chain Monte Carlo,
> > via "mcmcsamp".
> > 
> >   hope this helps,
> >   Spencer Graves
> > 
> > Jon Olav Vik wrote:
> >> Dear list members,
> >>
> >> I'm interested in showing that within-group statistical dependence is
> >> negligible, so I can use ordinary linear models without including random
> >> effects. However, I can find no mention of testing a model with vs.
> >> without random effects in either Venable & Ripley (2002) or Pinheiro and
> >> Bates (2000). Our in-house statisticians are not familiar with this,
> >> either, so I would greatly appreciate the help of this list.
> >>
> >> Pinheiro & Bates (2000:83) state that random-effect terms can be tested
> >> based on their likelihood ratio, if both models have the same
> >> fixed-effects structure and both are estimated with REML (I must admit I
> >> do not know exactly what REML is, although I do understand the concept of
> >> ML).
> >>
> >> The examples in Pinheiro & Bates 2000 deal with simple vs. complicated
> >> random-effects structures, both fitted with lme and method="REML".
> >> However, to fit a model without random effects I must use lm() or glm().
> >> Is there a way to tell these functions to use REML? I see that lme() can
> >> use ML, but Pinheiro&Bates (2000) advised against this for some reason.
> >>
> >> lme() does provide a confidence interval for the between-group variance,
> >> but this is constructed so as to never include zero (I guess the interval
> >> is as narrow as possible on log scale, or something). I would be grateful
> >> if anyone could tell me how to test for zero variance between groups.
> >>
> >> If lm1 and lme1 are fitted with lm() and lme() respectively, then
> >> anova(lm1,lme1) gives an error, whereas anova(lme1,lm1) gives an answer
> >> which looks reasonable enough.
> >>
> >> The command logLik() can retrieve either restricted or ordinary
> >> log-likelihoods from a fitted model object, but the likelihoods are then
> >> evaluated at the fitted parameter estimates. I guess these estimates
> >> differ from if the model were estimated using REML?
> >>
> >> My actual application is a logistic regression with two continuous and one
> >> binary predictor, in which I would like to avoid the complications of
> >> using generalized linear mixed models. Here is a simpler example, which is
> >> rather trivial but illustrates the general question:
> >>
> >> Example (run in R 2.2.1):
> >>
> >> library(nlme)
> >> summary(lm1 <- lm(travel~1,data=Rail)) # no random effect
> >> summary(lme1 <- lme(fixed=travel~1,random=~1|Rail,data=Rail)) # random
> >> effect
> >> intervals(lme1) # confidence for random effect
> >> anova(lm1,lme1)
> >> ## Outputs warning message:
> >> # models with response "NULL" removed because
> >> # response differs from model 1 in: anova.lmlist(object, ...)
> >> anova(lme1,lm1)
> >> ## Output: Can I trust this?
> >> #      Model df      AIC      BIC    logLik   Test  L.Ratio p-value
> >> # lme1     1  3 128.1770 130.6766 -61.08850
> >> # lm1      2  2 162.6815 164.3479 -79.34075 1 vs 2 36.50451  <.0001
> >> ## Various log likelihoods:
> >> logLik(lm1,REML=FALSE)
> >> logLik(lm1,REML=TRUE)
> >> logLik(lme1,REML=FALSE)
> >> logLik(lme1,REML=TRUE)
> >>
> >> Any help is highly appreciated.
> >>
> >> Best regards,
> >> Jon Olav Vik
> >>
> >> ______________________________________________
> >> R-help at stat.math.ethz.ch mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide! 
> >> http://www.R-project.org/posting-guide.html
> > 
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

-- 
Andrew Robinson  
Department of Mathematics and Statistics            Tel: +61-3-8344-9763
University of Melbourne, VIC 3010 Australia         Fax: +61-3-8344-4599
Email: a.robinson at ms.unimelb.edu.au         http://www.ms.unimelb.edu.au