# [R-sig-ME] Beginner help setting up a mixed model

Douglas Bates bates at stat.wisc.edu
Tue Oct 12 23:03:31 CEST 2010

```On Tue, Oct 12, 2010 at 3:08 PM, Christopher Desjardins
<desja004 at umn.edu> wrote:
> Hi Beth,
>
> On Tue, Oct 12, 2010 at 2:03 PM, Beth Holbrook <bvholbi at yahoo.com> wrote:
>
>> I am new to both R and mixed models, so I apologize in advance about the
>> simplistic nature of my question.
>>
>> My dataset is unbalanced and have been I have been advised by the
>> statistician
>> on my Ph.D. committee to run a mixed model.  The response variable is
>> distance.
>> The factors are preyswim, light, and trial.  I am interested in the effects
>> of
>> light and preyswim on distance.  Trial is a random factor and there is also
>> a
>> crossed random effect of preyswim*trial.  In SAS, the code would look like
>> this:
>>
>> Proc mixed data=laketrout;
>> Class preyswim light trial;
>> Model distance = light | preyswim;
>> Random trial trial*preyswim;
>> Run;
>>
>> My understanding is that to test the full ANOVA model in R, I would need to
>> code
>> three separate models and determine the most appropriate.  This is how I
>> attempted to code the same model using the lme4 package in R:
>>
>> options(contrasts=c(unordered="contr.SAS", ordered="contr.poly"))
>> mixed1 <- lmer(distance~preyswim*light + (1|trial) + (1|preyswim),
>> laketrout,
>> REML=FALSE)
>>
>
> I am not sure this is the model you want to run. If you want a crossed
> random effect I believe you need something like the following:
>
> mixed1 <- lmer(distance~ + (1|trial) + (1|preyswim) + (1|preyswim:light),
> laketrout,REML=FALSE)
>
> But then you no longer have any fixed effects. So I am not sure. I wonder if
> you want this based on what you said above...
>
> mixed1 <- lmer(distance~ preyswim + light + (1|trial) + (1|preyswim:light),
> laketrout,REML=FALSE)
>
> or ...
>
> mixed1 <- lmer(distance~ preyswim + light + preyswim*light + (1|trial),
> laketrout,REML=FALSE)
>
> But that model doesn't treat preyswim*light as a crossed random effect
>
>
> mixed2 <-lmer(distance~preyswim + light + (1|trial) + (1|preyswim),
>> laketrout,
>> REML=FALSE)
>> mixed3 <-lmer(distance~preyswim + (1|trial) + (1|preyswim), laketrout,
>> REML=FALSE)
>>
>> My questions are:
>>
>> 1.  Did I correctly set up the mixed model in R?
>> 2.  When determining the most appropriate model in R (I used the anova
>> function
>> to compare the three models), does it matter whether I use the AIC/BIC or
>> log-likelihood criterion?  Along those lines, I'm unsure whether I need to
>> use
>> REML=TRUE or REML=FALSE in my code.
>>
>
> The anova function would work here because your models are nested. I believe
> it's quite common to use the AIC and use the criterion of Burnham and
> Anderson, 2002. Also, I believe you want REML=TRUE when comparing models
> with different random effects.

Hmm - I think you have that backwards.  When comparing models with
different *fixed* effects you must use REML=FALSE.  The REML criterion
depends on the structure of the fixed-effects model matrix and you
can't easily compare values of the REML criterion for models with
different fixed-effects structure.  There may be advice to use
REML=TRUE when the random effects change but, if so, I don't know why.

My current attitude is that the usual justification for using REML
(reducing the bias in estimates of variance components) isn't as
important as many people believe, because the distribution of the
estimator can be highly skewed and why should we be overwhelmingly
interested in the mean value of a highly skewed distribution?  I
prefer to stay with maximum likelihood fits so that likelihood ratio
tests and derived criterion like AIC and BIC are well-defined.

>> I no longer have access to SAS although it is the preferred statistical
>> program
>> used by my committee statistician.  I understand that there are some
>> limitations
>> to SAS anyway, particularly when calculating p-values for mixed models with
>> random effects.

Not sure I understand that statement.  All mixed models have random
effects.  The definition of a mixed models is that it incorporates
both fixed-effects parameters and random effects.

You mentioned p-values and I think you are reversing SAS and lme4
there.  Many people consider it to be a deficiency of the lme4 package
(which is different from R itself - you can blame SAS Institute for
everything that is in SAS but you should not blame the R Foundation
for everything that is in contributed R packages) that it does not
give p-values for fixed-effects coefficients in a model fit by lmer.
The preferred approach with lme4 is to fit the model with and without
a particular term and use a likelihood ratio test to compare the
quality of the fits.  SAS PROC MIXED, on the other hand, is happy to
provide p-values.  In fact, SAS seem to pride themselves on providing
many different p-values for the same test, even when the test wouldn't
make sense.

>> My preference is to use R for my statistical analyses, as
>> long
>> as I know that I'm using it correctly.

>>
>>
>> Thanks so much for your help,
>> -Beth
>>
>> Beth Holbrook
>> University of Minnesota
>> bvholbi at yahoo.com
>>
>>
>>
>>
>>        [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>
>
>
>
> --
> Christopher David Desjardins
> Ph.D. student, Quantitative Methods in Education
> M.S. student, Statistics
> University of Minnesota
>
>        [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>

```