[R-sig-ME] lmer: No significant coefficients, but significant improvement of model fit?

Thu Nov 8 11:16:22 CET 2012

Dear Ben,

thank you very, very much for your extensive answer!

I have run both models you suggest, and they fit equally well (Chi^2 = 
.99). In addition, the effect of each predictor is the same in both 
models (which makes sense of course).

I will just look at each effects' significance level, as per your 
worries regarding the number of schools (i.e. 8 is not many school).

Before endeavoring more interpretation, though, I will first read up on 
mixed models using the valuable suggestions given earlier :-)

Again, thank you very, very much for your answer, kind regards,

Gjalt-Jorn

*Gjalt-Jorn Peters* | http://behaviorchange.eu

Behavior change research | Health psychology
Intervention development | Applied social psychology 	[ 	GG 
<http://greatergood.eu> 	OU <http://ou.nl> 	UM 
<http://maastrichtuniversity.nl> 	}

On 07-11-2012 16:00, Ben Bolker wrote:
>   <r-sig-mixed-models at ...> writes:
>
>>   Hey all, This is my first post - but I assume that like at other
>> lists, brevity is appreciated, so I have a short version and a long
>> version:
>    thanks.  I will answer the short version and see how far I get
> with the long version.
>
>> SHORT VERSION, QUESTIONS ONLY:
>   
>> 1) how is it possible that using lmer, none of the fixed effects has
>> significant coefficients, yet the model with those parameters fits
>> significantly better than a model without those parameters? Is this
>> an example of why lmer didn\'t use to report p-values for the
>> coefficients?
>    This is not really an lmer question, but a more general modeling
> question.  There are a few things you could mean here, but I don't
> think any of them have to do with the "p-value issue", which is
> more one of how to deal with the unknown distribution of the test
> statistic under the null hypothesis for not-large data sets
> {see http://glmm.wikidot.com/faq for more links on the p-value stuff,
> and others}
>
>    * you could be asking about the difference between the results
> of summary() [which uses Wald tests based on local curvature]
> and anova() [which does a more precise test based on model comparison];
> anova() is not perfect, but it's more accurate (and hence sometimes
> different from) summary
>    * you could be asking about multiple predictors, none of which
> is individually significant at p<0.05, but their combined effects
> (i.e. comparing a model with all predictors vs. none) are significant
> at p<0.05.  This is not really surprising, because the joint effect
> of the predictors can be stronger than any one individually.  (Also,
> if you're not working with a balanced, nested LMM, the effects of
> the predictors can interact.)
>
>> 2) what do the slash and the colon mean exactly when specifying lmer models?
>    A colon refers to an interaction, a slash refers to nesting (so
> ~a/b is equivalent to ~a+a:b, or "b nested within a"): there's more
> on this at the wikidot FAQ as well.
>
>
>> LONG VERSION WITH BACKGROUND: I am unexperienced with mixed models,
>> but I have a dataset that has several levels that needs to be
>> analysed - and I \'always\' wanted to learn multilevel analysis
>> anyway, so I decided this was a good occasion.  However, there are
>> no courses at hand in the near future, so I\'m trying to get there
>> with online resources and some books (such as \"discovering
>> statistics using R\" by Andy Field, and in a slightly different
>> category, the Multilevel Analysis book by Joop and the one by
>> Snijders & Bosker. However, apparently, I lack what it takes to
>> autodidactically learn this :-/ So I apologise, but I decided to
>> draw on your wisdom.  I\'m also kind of hoping that doing multilevel
>> analyses is a good way of learning how to do them.
>   
>> I must admit that I don\'t feel like I master the lmer model
>> formulation, but I found a post by Harold Doran [1] where he
>> explains the lmer syntax. My data file is structured the same as the
>> one he models in fm3, fm4 and fm5. I have the following variables
>> (of interest):
> * cannabisUse_bi: a factor with two levels, \"0\" and \"1\". \'0\'
>    indicates no cannabis use in the past week; \'1\' indicates cannabis
>    use in the past week. This is the dependent variable (i.e. the
>    criterion).
> * moment: a factor with two levels, \'before\' and \'after\'
> * id.factor: a factor with 444 levels, the identification of each
>    participants (note that there are quite a lot of missing values,
>    only about 276 cases without missings)
> * school: a factor with 8 levels, each representing the school that
>    the participants attend
> * cannabisShow: a factor with 2 levels, \'control\' and
>   \'intervention\' - this reflects whether a participant received the
>   \'intervention\', aimed to decrease cannabis use, or
>   not. Participants in five schools received the intervention;
>   participants in three other schools didn\'t.
>   
>> Every person provided two datapoints (one before the intervention
>> took place, and one after); there are several persons in a school;
>> and there are several school in each condition (level) of
>> cannabisShow.
>   
>> As far as I understand, this translates to \"Moment is nested within
>> person (\'id.factor\'), which is nested within school, which is
>> nested within cannabisShow\" (not sure about that last bit).
>    Although others on this list disagree, I don't find "nesting" to be
> very useful in the context of fixed effects, because the levels of
> fixed effects almost always have identical meanings across different
> levels of the random effect (i.e., "before" means the same for me as
> for you)
>
>   I would say the simplest sensible model would be
>
> glmer(cannabisUse_bi ~ cannabisShow*moment + (1|school/id.factor),
>      family=binomial, data=dat.long)
>
> which if your individuals are uniquely identified should be the same
> as using (1|school) + (1|id.factor) as the random effects.
>
> But I agree that you may very well want to try to take into account
> whether the effects of the fixed effects differ among schools: you
> might _like_ to see whether they differ among individuals as well, but
> it is somewhere between impossible and very difficult to extract this
> from binary data per individual (I'm sure you can't identify the
> effects of cannabisShow, because each individual only gets one
> intervention, and I'm pretty sure that you can't identify the effects
> of before/after either, because all you have is binary data -- if you
> had continuous data you *might* be able to detect variation in slope
> among individuals, if it weren't confounded with residual error).
>
> So I would try
>
> glmer(cannabisUse_bi ~ cannabisShow*moment +
>     (cannabisShow*moment|school) + (1|id.factor), family=binomial,
>     data=dat.long)
>
> (assuming that id.factor is unique across schools)
>
>
>> Now, this model doesn\'t include the effect of the intervention, and
>    if I include that, I get:
>   
>> rep_measures.new.model <- lmer(usedCannabis_bi ~ 1 + moment *
>> cannabisShow + (moment|school/id.factor), family=binomial(link =
>> \"logit\"), data=dat.long);
>   
>> If I compare these two models using Anova, the second one fits
>> better (logLik from -182.02 to -166.68, ChiSq = 30.681, Df = 2, p =
>> 2.177e-07). However, when you look at rep_measures.new.model, none
>> of the fixed effects is significant. I may be completely wrong, but
>> doesn\'t this mean that the cannabisShow variable, nor its
>> interaction with measurement moment (i.e. \'time\'), contributes to
>> explaining the dependent variable (i.e. cannabisUse_bi)?
>    Maybe the before/after variation among schools (moment|school) is
>    doing a lot?  Also, see my comment above about Wald tests.
>   
>> (in fact, I\'m also a bit confused as to the p-values that lmer
>> provides for the fixed effects. I thought that there were good
>> reasons not to - and that lmer wasn\'t supposed to? [3] (I don\'t
>> understand the post - I\'m sadly not a statistician - but I thought
>> I got the gist) Apparently this changed . . . ?)
>    glmer provides likelihood ratio tests, which are good when the
> sample size is large.  If you didn't have the school level I would say
> not to worry about it, but 8 schools is not a large number ...
>
>>   And now that I\'m mailing anyway: what is the difference between
>> these two models?
>   
>> rep_measures.new.model.1 <- lmer(usedCannabis_bi ~ 1 + moment *
>> cannabisShow + (moment|school/id.factor), family=binomial(link =
>> \"logit\"), data=dat.long);
>> rep_measures.new.model.2 <- lmer(usedCannabis_bi ~ 1 + moment *
>> cannabisShow + (moment|id.factor:school), family=binomial(link =
>> \"logit\"), data=dat.long);
>   
>> R gives slightly (but only slightly) different coefficient
>> estimates; but on the first one, he seems to understand that school
>> is a level (with 8 values), where for the second one, this is
>> apparently not specified . . . What\'s the difference between the
>> slash and the colon for indicating levels (the levels have to be
>> \'the other way around\', apparently?)?
>    The second leaves out the school effect, as specified above.
>
>>   I\'m sorry to bother the list with such basic questions. I\'ve been
>> looking for a tutorial or explanation, but I\'ve only been able to
>> find little bits of information that I pieced together into my
>> current (lack of ) understanding . . .
>   
>> Thank you in advance!
>>
>> Gjalt-Jorn Peters
>   
>> PS: I\'ve put the R script at
>> http://sciencerep.org/files/7/the%20cannabis%20show%20-%20analyses.r
>> (the part I\'m talking about now starts after the line with \"######
>> Behaviour\", line 195 - the real analyses I\'m talking about now
>> start at line 314) This .R file downloads the data from
>> http://sciencerep.org/files/7/the%20cannabis%20show%20-%20data.tsv
>   
>> The output you should get is at
>> http://sciencerep.org/files/7/the%20cannabis%20show%20-%20output.txt
>> (but the output file is kind of hard to interpret without the
>> analyses file, as I didn\'t \"cat\" all comments)
>   
>> [1] http://tolstoy.newcastle.edu.au/R/e2/help/06/10/3345.html
>> [2] http://www.rensenieuwenhuis.nl/r-sessions-17-generalized-multilevel-lme4/
>>      http://www.talkstats.com/showthread.php/
>      14393-Need-help-with-lmer-model-specification-syntax-for-nested-mixed-model
>>      http://www.bodowinter.com/tutorial/bw_LME_tutorial.pdf
>> [3] https://stat.ethz.ch/pipermail/r-help/2006-May/094765.html
>>
>>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models