[R-sig-ME] P-values from interaction terms using lme4

Wed Aug 9 16:58:23 CEST 2017

Re'CCing the list after being bounced due to PGP signature.

On 08/09/2017 04:46 PM, Douglas Bates wrote:
> I think you mean "conditional modes", i.e. the point where the
> conditional density of the random effects is maximized, not "conditional
> models".

Yes, that was a typo.

> Technically it is true that the number of parameters does not depend on
> the number of random effects, only on the number of unique values in the
> covariance matrices for the random effects.  However, I think that leads
> to an undercount of the effective number of parameters when, say,
> performing a likelihood ratio test of models that differ in their random
> effects specification.

This is related to the more general issue of degrees of freedom in mixed
models, right?

But in terms of estimation (especially as implemented in
lme4/MixedModels.jl), increasing the levels of a random effect will
generally provide better estimates, right?

Phillip

> 
> On Wed, Aug 9, 2017 at 9:08 AM Phillip Alday <phillip.alday at mpi.nl
> <mailto:phillip.alday at mpi.nl>> wrote:
> 
>     Today is apparently the day for me to follow up on zombie threads:
> 
>     While Alex is right that mixed-effects models are difficult with small
>     sample sizes, there seems to be an implicit assumption that a large
>     number of grouping levels and thus a large number of random intercepts
>     increases the number of model parameters. This isn't correct. The
>     variance of the intercepts is a model parameter, not the intercepts
>     themselves -- the individual intercepts are BLUPs / predictions /
>     conditional models and not estimates / model parameters. This is the
>     difference to including subject-id as a covariate in a classical
>     fixed-effects regression. Indeed, increasing the number of groups
>     generally helps in mixed-effects regression because variance estimates
>     are quite sensitive to small sample sizes. (@Ben Bolker and others if
>     I've messed this up, please correct me!)
> 
>     Your sample size is still nonetheless quite small if you want to model
>     e.g. the variance for playback type (which I think could be important in
>     your model). Using rmANOVA isn't really better than a mixed-effects
>     model here, because rmANOVA *is* an intercepts-only mixed-effects model
>     with additional symmetry assumptions (sphericity)!
> 
>     Compare the output of
>     > summary(aov(Reaction ~ Days + Error(Subject),sleepstudy))
> 
>     Error: Subject
>               Df Sum Sq Mean Sq F value Pr(>F)
>     Residuals 17 250618   14742
> 
>     Error: Within
>                Df Sum Sq Mean Sq F value Pr(>F)
>     Days        1 162703  162703   169.4 <2e-16 ***
>     Residuals 161 154634     960
> 
>     and
> 
>     > anova(lmer(Reaction ~ Days + (1|Subject),sleepstudy))
>     Analysis of Variance Table
>          Df Sum Sq Mean Sq F value
>     Days  1 162703  162703   169.4
> 
>     You can get the residual Mean Sq F in the mixed model by looking at the
>     variance column of the Residual grouping under Random effects in
>     summary()-output.
> 
>     The advantage to mixed-effects models in this case is that you're doing
>     the regression explicitly.
> 
>     Best,
>     Phillip
> 
> 
> 
>     On 06/13/2016 01:43 AM, Sam Hardman [sah74] wrote:
>     > Hi Alex,
>     >
>     >
>     > The reason I was looking at using mixed models is because I have
>     > repeated measures for indivuals (i.e. each bird was tested three times
>     > with three different playback treatments). As far as I know ANOVA
>     can't
>     > control for this. I am wrong about this?
>     >
>     >
>     > You are correct that I have 40 animals in the experiment, with three
>     > treatments each that makes 120 measurements per response behaviour. I
>     > agree with you that the sample size is small which could cause
>     problems
>     > with reliability of results. If ANOVA could work then I will certainly
>     > look at that. How would I control for repeated measures?
>     >
>     >
>     > Thanks for your help, I appreciate it.
>     >
>     > Sam
>     >
>     >
>     ------------------------------------------------------------------------
>     > *Von:* Alex Fine <abfine at gmail.com <mailto:abfine at gmail.com>>
>     > *Gesendet:* Sonntag, 12. Juni 2016 23:10:59
>     > *An:* Phillip Alday
>     > *Cc:* Sam Hardman [sah74]; r-sig-mixed-models at r-project.org
>     <mailto:r-sig-mixed-models at r-project.org>
>     > *Betreff:* Re: [R-sig-ME] P-values from interaction terms using lme4
>     >
>     > Hey Sam,
>     >
>     > I actually think mixed effects regression might be inappropriate
>     in this
>     > case.  Do I have it right that you have 40 total animals in the
>     > experiment?  You said you had 1 bird in the city center and 1 outside
>     > the city in 20 cities.
>     >
>     > If that's right, then the model you described (to take one of your
>     > examples) would be:
>     >
>     > Approach ~ PlayBack * UrbanRural + (1|ID)
>     >
>     > That's 3 * 2 - 1 = 5 coefficients in your model, plus a random
>     intercept
>     > for 20 different cities.  I don't think you have enough data to
>     fit such
>     > a model and trust the results.
>     >
>     > If you have a balanced design and you have clear a priori predictions
>     > about each of the contrasts in your design, I'd recommend just using
>     > ANOVA.  I don't think you gain very much from using MEMs in this case.
>     > You could use vanilla linear regression, but setting up the coding
>     > schemes for your two predictors is going to be so complicated that
>     it'll
>     > actually be much simpler to just do ANOVA + t-tests.
>     >
>     > Alex
>     >
>     > On Sun, Jun 12, 2016 at 4:31 AM, Phillip Alday
>     > <Phillip.Alday at unisa.edu.au <mailto:Phillip.Alday at unisa.edu.au>
>     <mailto:Phillip.Alday at unisa.edu.au
>     <mailto:Phillip.Alday at unisa.edu.au>>> wrote:
>     >
>     >     Hi Sam,
>     >
>     >     if you're getting p-values from lmer outside of a
>     likelihood-ratio test,
>     >     then you're using lmerTest, not lme4. lmerTest is designed to be a
>     >     drop-in replacement for lme4, but it does bring some extra
>     >     features/'complications' in the form of the denominator degrees of
>     >     freedom.
>     >
>     >     There are two easy options for getting ANOVA-style p-values:
>     >
>     >     1. Using lmerTest, you can just do
>     >     anova(mod1,ddf="Satterthwaite",type=2)
>     >
>     >     If you're using vanilla lme4, then you need to do this first:
>     >
>     >     library(lmerTest)
>     >     mod1 <- as(mod1,"merModLmerTest")
>     >
>     >     which will cast your model to an lmerTest model.
>     >
>     >     2. Using the car package:
>     >
>     >     library(car)
>     >     Anova(mod1,test.statistic="F",type=2)
>     >
>     >     If you fitted your model with maximum likelihood, i.e. with
>     REML=FALSE,
>     >     then you need to refit your model using REML first:
>     >
>     >     mod1 <- update(mod1,REML=TRUE)
>     >
>     >     For both options, you need to specify the types of test you're
>     doing. I
>     >     highly recommend Type 2, but there is a lot of material on
>     that debate,
>     >     search for e.g. Venables' "Exegeses on linear models", or look
>     at the
>     >     documentation for car::Anova(). Please note that the
>     distinction for
>     >     Type 2 vs Type 3 is *very* relevant for your question since you're
>     >     concerned about interaction terms.
>     >
>     >     For the lmerTest route, you can specify the approximation to
>     use for the
>     >     denominator degrees of freedom: Satterthwaite is much faster, but
>     >     Kenward-Roger is more accurate. For car::Anova(), the
>     F-statistic is
>     >     always computed with  Kenward-Roger (and only works for
>     REML-fitted
>     >     models for reasons that I can't explain quickly), but you have the
>     >     option of using a Chisq test statistic, which is equivalent to
>     assuming
>     >     that the denominator degrees of freedom are infinite, or
>     equivalently,
>     >     that your t-values for the coefficients are z-values.
>     >
>     >     These ANOVA-style tests are Wald tests and are asymptotically
>     equivalent
>     >     to the LR-tests, but are less conservative for finite samples.
>     >
>     >     Now, since you care about particular contrasts within the
>     model, you may
>     >     also just want to look at your model coefficients. Depending
>     on which
>     >     coding scheme you're using, the contrasts represented by your
>     model
>     >     coefficients might not be the ones you want, but packages like
>     lsmeans
>     >     (a regular mention on the list here and very well documented) can
>     >     compute all types of contrasts post-hoc. Rereading your
>     question, this
>     >     may be the best way to go for your target conclusion/result
>     statements.
>     >
>     >     Best,
>     >     Phillip
>     >
>     >
>     >     On Sat, 2016-06-11 at 16:41 +0000, Sam Hardman [sah74] wrote:
>     >     > Dear all,
>     >     >
>     >     >
>     >     > I have some data which I would like to analyse using lme4 and I
>     >     would really appreciate some help deciding what the best
>     method is.
>     >     >
>     >     >
>     >     > My experiment is as follows:
>     >     >
>     >     >
>     >     > I tested the responses of urban and rural great tits to
>     playbacks
>     >     of great tit song from a loud speaker within their territories.
>     >     >
>     >     >
>     >     > I created three playback song types:
>     >     >
>     >     > -undegraded
>     >     >
>     >     > -degraded
>     >     >
>     >     > -very degraded
>     >     >
>     >     >
>     >     > I played these to birds in 20 different cities. In each city I
>     >     tested one bird in the city centre and one bird in a rural
>     location
>     >     outside of the city (i.e. paired samples). Each bird received all
>     >     three playback types.
>     >     >
>     >     >
>     >     > I measured five different responses to these playbacks:
>     >     >
>     >     > -Time to sing back to playback (in seconds)
>     >     >
>     >     > -Time to sing back to playback (in seconds)
>     >     >
>     >     > -Time the bird spent within five metres of the speaker (in
>     seconds)
>     >     >
>     >     > -Number of times the bird flew over the speaker (count)
>     >     >
>     >     > -The closest approach the bird made to the speaker (in metres)
>     >     >
>     >     >
>     >     > For each of these five repsonses I would like to know if
>     there is
>     >     an interaction between habitat and playback type.
>     >     >
>     >     >
>     >     > So, I have a model which looks like this:
>     >     >
>     >     >
>     >     > mod1<-lmer(repsonse ~ Playback*UR + (1|ID))
>     >     >
>     >     >
>     >     > Where response is one of the five repsonse behaviours,
>     playback is
>     >     the playback type, UR is habitat type (urban or rural) and ID
>     is the
>     >     ID of the bird.
>     >     >
>     >     >
>     >     > This gives me results and P-Values but am not sure these
>     P-values
>     >     are valid and I think I should compare this model to a null
>     model to
>     >     get a valid P-values.
>     >     >
>     >     >
>     >     > So I can use a likelihood ratio tests to test for differences in
>     >     response by habitat type alone:
>     >     >
>     >     >
>     >     > mod1<-lmer(approach ~ UR + (1|Location))
>     >     > mod2<-lmer(approach ~ 1 + (1|Location))
>     >     > anova(mod1, mod2)
>     >     >
>     >     > or for differences in response according tom playback type
>     alone:
>     >     >
>     >     > mod1<-lmer(approach ~ Playback + (1|Location))
>     >     > mod2<-lmer(approach ~ 1 + (1|Location))
>     >     > anova(mod1, mod2)
>     >     >
>     >     > But how should I do this when there is an interaction term? I
>     >     deally I would like P-values for each playback type in interaction
>     >     with habitat. e.g.
>     >     >
>     >     > Undregraded playback * Habitat (urban/rural
>     >     > Degraded playback * Habitat (urban/rural)
>     >     > Very degraded playback * Habitat (urban/rural)
>     >     >
>     >     > This would allow me to say, for example, something like "urban
>     >     birds approached the speaker more closely than rural birds in
>     >     response to undegraded playbacks". I would like to do this
>     with each
>     >     of the five response behaviours.
>     >     >
>     >     > I would really appreciate any suggestions for the best way
>     forward
>     >     with this and apologies if this question is too simple for
>     this group.
>     >     >
>     >     > Best wishes,
>     >     > Sam
>     >     >
>     >     >
>     >     >
>     --------------------------------------------------------------------
>     >     > Aberystwyth - Prifysgol Gyntaf Cymru https://www.aber.ac.uk/cy/
>     >     >
>     >     > Aberystwyth - Wales' First University https://www.aber.ac.uk/en/
>     >     >
>     >     >       [[alternative HTML version deleted]]
>     >     >
>     >     > _______________________________________________
>     >     > R-sig-mixed-models at r-project.org
>     <mailto:R-sig-mixed-models at r-project.org>
>     >     <mailto:R-sig-mixed-models at r-project.org
>     <mailto:R-sig-mixed-models at r-project.org>> mailing list
>     >     > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>     >
>     >     _______________________________________________
>     >     R-sig-mixed-models at r-project.org
>     <mailto:R-sig-mixed-models at r-project.org>
>     >     <mailto:R-sig-mixed-models at r-project.org
>     <mailto:R-sig-mixed-models at r-project.org>> mailing list
>     >     https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>     >
>     >
>     >
>     >
>     > --
>     > Alex Fine
>     > Ph. (336) 302-3251 <tel:(336)%20302-3251>
>     > web:  http://internal.psychology.illinois.edu/~abfine/
>     > <http://internal.psychology.illinois.edu/~abfine/AlexFineHome.html>
>     >
>     >
>     > --------------------------------------------------------------------
>     > Aberystwyth - Prifysgol Gyntaf Cymru https://www.aber.ac.uk/cy/
>     >
>     > Aberystwyth – Wales’ First University https://www.aber.ac.uk/en/
> 
>     _______________________________________________
>     R-sig-mixed-models at r-project.org
>     <mailto:R-sig-mixed-models at r-project.org> mailing list
>     https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>