[R-sig-ME] lmer: effects of forcing fixed intercepts and slopes
ONKELINX, Thierry
Thierry.ONKELINX at inbo.be
Wed Nov 7 10:00:52 CET 2012
Mixed models are not that scary. I would recommend to read Zuur et al (2009). It was written with 'mainstream researchers' (in ecology) in mind. It start with simple linear models and gradually adds complexity (glm, gam, lmm, glmm, gamm, ...)
Dear list,
Thierry, great, thank you very much for your quick reply! I will drop moment as a random slope, and read up on the different hypotheses that are being tested.
I have one more question. Basically, I have no background in multilevel (as you may have guessed :-)). The reason I'm 'in over my head' like this, is because I basically want to 'use the proper analysis' for my data, and the only method is apparently mixed models. "All I want" is the simplest' statistically decent, way to test whether cannabis use at the second measurement moment is different in the group that received that intervention as compared to the group that didn't.
However, when I try to learn about mixed models, the sources I encounter approach the modelling practice very differently. They seem to be about much more advanced issues; whether random intercepts and slopes should be included, and for which variables, etc (to stick to those issues that I at least kind of understand). Apparently, either mixed models are only used by people who are statistically much more advanced (i.e. there's a gap between 'mainstream researchers' and the people who understand and use mixed models), or in fact these sources _do_ discuss the same things, but in mixed models the terminology just differs a lot from what you encounter in more basic statistical textbooks.
I basically have the idea that although my requirements are very basic, I have to learn lots of dark arcane issues to be able to do this properly. This is kind of 'scary', as, for example, matrix algebra is, well, scary :-)
What do people here think of this? Is mixed models just something you should avoid unless you're able & willing to really delve into its statistical innards?
Again, thank you very much, kind regards,
Gjalt-Jorn
>
> Dear all,
>
> I run into something I don't understand: I update a model with some terms; none of the terms is significant; but the model suddenly fits A LOT better . . .
>
> The background: I am running a model to test a relatively simple
> hypothesis: that an intervention aiming to reduce cannabis use is effective. It's a repeated measures design where we measured cannabis use of each student before and after the intervention. In addition to having repeated measures, students are nested in schools. A simple plot of the percentage of cannabis users before and after the intervention, in the control and the intervention groups, is at http://sciencerep.org/files/7/plot.png (this plot ignores the schools).
>
> This is the datafile:
>
> <R CODE>
> ### Load data
> dat.long <-
> read.table("http://sciencerep.org/files/7/the%20cannabis%20show%20-%20
> data%20in%20long%20format.tsv",
> header=TRUE, sep = "\t");
>
> ### Set 'participant' as factor
> dat.long$participant <- factor(dat.long$id);
>
> head(dat.long);
> </R CODE>
>
> This is what the head looks like:
>
> id moment school cannabisShow gender age usedCannabis_bi participant
> 1 1 before Zuidoost Intervention 2 NA NA 1
> 2 2 before Zuidoost Intervention 2 NA 0 2
> 3 3 before Zuidoost Intervention 1 NA 1 3
> 4 4 before Noord Intervention NA NA NA 4
> 5 5 before Noord Intervention NA NA 1 5
> 6 6 before Noord Intervention 1 NA NA 6
>
> 'school' has 8 levels;
> 'moment' has 2 levels ('before' and 'after'); 'cannabisShow' has 2 levels, 'Intervention' and 'Control'; 'usedCannabis_bi' has 2 levels, 0 and 1; and participants is the participant identifyer.
>
> I run a null model and a 'real' model, comparing the fit. These are the formulations I use:
>
> <R CODE>
> rep_measures.1.null <- lmer(formula = usedCannabis_bi ~
> 1 + moment + (1 + moment | school / participant),
> family=binomial(link = "logit"), data=dat.long);
> rep_measures.1.model <- update(rep_measures.1.null, .~. + moment*cannabisShow);
> rep_measures.1.null;
> rep_measures.1.model;
> anova(rep_measures.1.null, rep_measures.1.model); </R CODE>
>
> The second model, where I introduce the interaction between measurement moment and whether participants received the intervention (this should reflect an effect of the intervention), fits considerably better than the original model. But, the interaction is not significant. In fact, none of the fixed effects is - so I added terms to the model, none of these terms significantly contributes to the prediction of cannabis use, yet the model fits a lot better.
>
> This seems to be a paradox. Could anybody maybe explain how this is possible?
>
> I also looked at the situation where I impose fixed intercepts and slopes on the participant level (so intercepts and slopes could only vary per school):
>
> <R CODE>
> rep_measures.2.null <- lmer(formula = usedCannabis_bi ~
> 1 + moment + (1 + moment | school),
> family=binomial(link = "logit"), data=dat.long);
> rep_measures.2.model <- update(rep_measures.2.null, .~. + moment*cannabisShow);
> rep_measures.2.null;
> rep_measures.2.model;
> anova(rep_measures.2.null, rep_measures.2.model); </R CODE>
>
> Now the interaction between 'measurement moment' and 'intervention' is significant, as I expected; but the improvement in fit between the null model and the 'full model' is much, much smaller.
>
> This is very counter-intuitive to me - I have the feeling I'm missing something basic, but I have no idea what. Any help is much appreciated!
>
> Thank you very much in advance, kind regards,
>
> Gjalt-Jorn
>
>
> PS: the file with the analyses is at
> http://sciencerep.org/files/7/the%20cannabis%20show%20-%20analyses%20f
> or%20mailing%20list.r
>
