[R-sig-ME] lmer: effects of forcing fixed intercepts and slopes
ONKELINX, Thierry
Thierry.ONKELINX at inbo.be
Tue Nov 6 17:25:07 CET 2012
Dear Gjalt-Jorn,
Your null model is too complex for your data. Having only one measurement per participant per moment, you cannot fit a random 'slope' along moment per participant. Note the perfect correlation in your null model for the nested random effect.
Even at the school levels, the amount of data is not that larger and you end up with near perfect correlations in this random effect. So I would advise to drop moment as a random slope.
Don't forget that the summary of a model is testing different hypotheses than an LRT between two models! You might do some reading on that topic or get some local statistical advise.
Best regards,
Thierry
ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest
team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium
+ 32 2 525 02 51
+ 32 54 43 61 85
Thierry.Onkelinx op inbo.be
www.inbo.be
To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of.
~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data.
~ Roger Brinner
The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey
-----Oorspronkelijk bericht-----
Van: r-sig-mixed-models-bounces op r-project.org [mailto:r-sig-mixed-models-bounces op r-project.org] Namens Gjalt-Jorn Peters
Verzonden: dinsdag 6 november 2012 16:23
Aan: r-sig-mixed-models op r-project.org
Onderwerp: [R-sig-ME] lmer: effects of forcing fixed intercepts and slopes
Dear all,
I run into something I don't understand: I update a model with some terms; none of the terms is significant; but the model suddenly fits A LOT better . . .
The background: I am running a model to test a relatively simple
hypothesis: that an intervention aiming to reduce cannabis use is effective. It's a repeated measures design where we measured cannabis use of each student before and after the intervention. In addition to having repeated measures, students are nested in schools. A simple plot of the percentage of cannabis users before and after the intervention, in the control and the intervention groups, is at http://sciencerep.org/files/7/plot.png (this plot ignores the schools).
This is the datafile:
<R CODE>
### Load data
dat.long <-
read.table("http://sciencerep.org/files/7/the%20cannabis%20show%20-%20data%20in%20long%20format.tsv",
header=TRUE, sep = "\t");
### Set 'participant' as factor
dat.long$participant <- factor(dat.long$id);
head(dat.long);
</R CODE>
This is what the head looks like:
id moment school cannabisShow gender age usedCannabis_bi participant
1 1 before Zuidoost Intervention 2 NA NA 1
2 2 before Zuidoost Intervention 2 NA 0 2
3 3 before Zuidoost Intervention 1 NA 1 3
4 4 before Noord Intervention NA NA NA 4
5 5 before Noord Intervention NA NA 1 5
6 6 before Noord Intervention 1 NA NA 6
'school' has 8 levels;
'moment' has 2 levels ('before' and 'after'); 'cannabisShow' has 2 levels, 'Intervention' and 'Control'; 'usedCannabis_bi' has 2 levels, 0 and 1; and participants is the participant identifyer.
I run a null model and a 'real' model, comparing the fit. These are the formulations I use:
<R CODE>
rep_measures.1.null <- lmer(formula = usedCannabis_bi ~
1 + moment + (1 + moment | school / participant),
family=binomial(link = "logit"), data=dat.long);
rep_measures.1.model <- update(rep_measures.1.null, .~. + moment*cannabisShow);
rep_measures.1.null;
rep_measures.1.model;
anova(rep_measures.1.null, rep_measures.1.model); </R CODE>
The second model, where I introduce the interaction between measurement moment and whether participants received the intervention (this should reflect an effect of the intervention), fits considerably better than the original model. But, the interaction is not significant. In fact, none of the fixed effects is - so I added terms to the model, none of these terms significantly contributes to the prediction of cannabis use, yet the model fits a lot better.
This seems to be a paradox. Could anybody maybe explain how this is possible?
I also looked at the situation where I impose fixed intercepts and slopes on the participant level (so intercepts and slopes could only vary per school):
<R CODE>
rep_measures.2.null <- lmer(formula = usedCannabis_bi ~
1 + moment + (1 + moment | school),
family=binomial(link = "logit"), data=dat.long);
rep_measures.2.model <- update(rep_measures.2.null, .~. + moment*cannabisShow);
rep_measures.2.null;
rep_measures.2.model;
anova(rep_measures.2.null, rep_measures.2.model); </R CODE>
Now the interaction between 'measurement moment' and 'intervention' is significant, as I expected; but the improvement in fit between the null model and the 'full model' is much, much smaller.
This is very counter-intuitive to me - I have the feeling I'm missing something basic, but I have no idea what. Any help is much appreciated!
Thank you very much in advance, kind regards,
Gjalt-Jorn
PS: the file with the analyses is at
http://sciencerep.org/files/7/the%20cannabis%20show%20-%20analyses%20for%20mailing%20list.r
_______________________________________________
R-sig-mixed-models op r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
* * * * * * * * * * * * * D I S C L A I M E R * * * * * * * * * * * * *
Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is door een geldig ondertekend document.
The views expressed in this message and any annex are purely those of the writer and may not be regarded as stating an official position of INBO, as long as the message is not confirmed by a duly signed document.
More information about the R-sig-mixed-models
mailing list