# [R-sig-ME] Beginner help setting up a mixed model

Christopher David Desjardins desja004 at umn.edu
Wed Oct 13 01:37:20 CEST 2010

```  On 10/12/10 4:03 PM, Douglas Bates wrote:
> On Tue, Oct 12, 2010 at 3:08 PM, Christopher Desjardins
> <desja004 at umn.edu>  wrote:
>> Hi Beth,
>>
>> On Tue, Oct 12, 2010 at 2:03 PM, Beth Holbrook<bvholbi at yahoo.com>  wrote:
>>
>>> I am new to both R and mixed models, so I apologize in advance about the
>>> simplistic nature of my question.
>>>
>>> My dataset is unbalanced and have been I have been advised by the
>>> statistician
>>> on my Ph.D. committee to run a mixed model.  The response variable is
>>> distance.
>>> The factors are preyswim, light, and trial.  I am interested in the effects
>>> of
>>> light and preyswim on distance.  Trial is a random factor and there is also
>>> a
>>> crossed random effect of preyswim*trial.  In SAS, the code would look like
>>> this:
>>>
>>> Proc mixed data=laketrout;
>>> Class preyswim light trial;
>>> Model distance = light | preyswim;
>>> Random trial trial*preyswim;
>>> Run;
>>>
>>> My understanding is that to test the full ANOVA model in R, I would need to
>>> code
>>> three separate models and determine the most appropriate.  This is how I
>>> attempted to code the same model using the lme4 package in R:
>>>
>>> options(contrasts=c(unordered="contr.SAS", ordered="contr.poly"))
>>> mixed1<- lmer(distance~preyswim*light + (1|trial) + (1|preyswim),
>>> laketrout,
>>> REML=FALSE)
>>>
>> I am not sure this is the model you want to run. If you want a crossed
>> random effect I believe you need something like the following:
>>
>> mixed1<- lmer(distance~ + (1|trial) + (1|preyswim) + (1|preyswim:light),
>> laketrout,REML=FALSE)
>>
>> But then you no longer have any fixed effects. So I am not sure. I wonder if
>> you want this based on what you said above...
>>
>> mixed1<- lmer(distance~ preyswim + light + (1|trial) + (1|preyswim:light),
>> laketrout,REML=FALSE)
>>
>> or ...
>>
>> mixed1<- lmer(distance~ preyswim + light + preyswim*light + (1|trial),
>> laketrout,REML=FALSE)
>>
>> But that model doesn't treat preyswim*light as a crossed random effect
>>
>>
>> mixed2<-lmer(distance~preyswim + light + (1|trial) + (1|preyswim),
>>> laketrout,
>>> REML=FALSE)
>>> mixed3<-lmer(distance~preyswim + (1|trial) + (1|preyswim), laketrout,
>>> REML=FALSE)
>>>
>>> My questions are:
>>>
>>> 1.  Did I correctly set up the mixed model in R?
>>> 2.  When determining the most appropriate model in R (I used the anova
>>> function
>>> to compare the three models), does it matter whether I use the AIC/BIC or
>>> log-likelihood criterion?  Along those lines, I'm unsure whether I need to
>>> use
>>> REML=TRUE or REML=FALSE in my code.
>>>
>> The anova function would work here because your models are nested. I believe
>> it's quite common to use the AIC and use the criterion of Burnham and
>> Anderson, 2002. Also, I believe you want REML=TRUE when comparing models
>> with different random effects.
> Hmm - I think you have that backwards.  When comparing models with
> different *fixed* effects you must use REML=FALSE.  The REML criterion
> depends on the structure of the fixed-effects model matrix and you
> can't easily compare values of the REML criterion for models with
> different fixed-effects structure.  There may be advice to use
> REML=TRUE when the random effects change but, if so, I don't know why.
Yes I mistyped and got that backwards.
Chris
> My current attitude is that the usual justification for using REML
> (reducing the bias in estimates of variance components) isn't as
> important as many people believe, because the distribution of the
> estimator can be highly skewed and why should we be overwhelmingly
> interested in the mean value of a highly skewed distribution?  I
> prefer to stay with maximum likelihood fits so that likelihood ratio
> tests and derived criterion like AIC and BIC are well-defined.
>
>>> I no longer have access to SAS although it is the preferred statistical
>>> program
>>> used by my committee statistician.  I understand that there are some
>>> limitations
>>> to SAS anyway, particularly when calculating p-values for mixed models with
>>> random effects.
> Not sure I understand that statement.  All mixed models have random
> effects.  The definition of a mixed models is that it incorporates
> both fixed-effects parameters and random effects.
>
> You mentioned p-values and I think you are reversing SAS and lme4
> there.  Many people consider it to be a deficiency of the lme4 package
> (which is different from R itself - you can blame SAS Institute for
> everything that is in SAS but you should not blame the R Foundation
> for everything that is in contributed R packages) that it does not
> give p-values for fixed-effects coefficients in a model fit by lmer.
> The preferred approach with lme4 is to fit the model with and without
> a particular term and use a likelihood ratio test to compare the
> quality of the fits.  SAS PROC MIXED, on the other hand, is happy to
> provide p-values.  In fact, SAS seem to pride themselves on providing
> many different p-values for the same test, even when the test wouldn't
> make sense.
>
>>> My preference is to use R for my statistical analyses, as
>>> long
>>> as I know that I'm using it correctly.
>>>
>>> Thanks so much for your help,
>>> -Beth
>>>
>>> Beth Holbrook
>>> University of Minnesota
>>> bvholbi at yahoo.com
>>>
>>>
>>>
>>>
>>>         [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> R-sig-mixed-models at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>
>>
>>
>> --
>> Christopher David Desjardins
>> Ph.D. student, Quantitative Methods in Education
>> M.S. student, Statistics
>> University of Minnesota
>>
>>         [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>

```