[R-sig-ME] lmer: No significant coefficients, but significant improvement of model fit?

Wed Nov 7 16:00:47 CET 2012

 <r-sig-mixed-models at ...> writes:

>  Hey all, This is my first post - but I assume that like at other
> lists, brevity is appreciated, so I have a short version and a long
> version:

  thanks.  I will answer the short version and see how far I get
with the long version.

> 
> SHORT VERSION, QUESTIONS ONLY:

> 1) how is it possible that using lmer, none of the fixed effects has
> significant coefficients, yet the model with those parameters fits
> significantly better than a model without those parameters? Is this
> an example of why lmer didn\'t use to report p-values for the
> coefficients?

  This is not really an lmer question, but a more general modeling
question.  There are a few things you could mean here, but I don't
think any of them have to do with the "p-value issue", which is
more one of how to deal with the unknown distribution of the test
statistic under the null hypothesis for not-large data sets
{see http://glmm.wikidot.com/faq for more links on the p-value stuff,
and others}

  * you could be asking about the difference between the results
of summary() [which uses Wald tests based on local curvature]
and anova() [which does a more precise test based on model comparison];
anova() is not perfect, but it's more accurate (and hence sometimes
different from) summary
  * you could be asking about multiple predictors, none of which
is individually significant at p<0.05, but their combined effects
(i.e. comparing a model with all predictors vs. none) are significant 
at p<0.05.  This is not really surprising, because the joint effect
of the predictors can be stronger than any one individually.  (Also,
if you're not working with a balanced, nested LMM, the effects of
the predictors can interact.)

> 2) what do the slash and the colon mean exactly when specifying lmer models?

  A colon refers to an interaction, a slash refers to nesting (so
~a/b is equivalent to ~a+a:b, or "b nested within a"): there's more
on this at the wikidot FAQ as well.

> LONG VERSION WITH BACKGROUND: I am unexperienced with mixed models,
> but I have a dataset that has several levels that needs to be
> analysed - and I \'always\' wanted to learn multilevel analysis
> anyway, so I decided this was a good occasion.  However, there are
> no courses at hand in the near future, so I\'m trying to get there
> with online resources and some books (such as \"discovering
> statistics using R\" by Andy Field, and in a slightly different
> category, the Multilevel Analysis book by Joop and the one by
> Snijders & Bosker. However, apparently, I lack what it takes to
> autodidactically learn this :-/ So I apologise, but I decided to
> draw on your wisdom.  I\'m also kind of hoping that doing multilevel
> analyses is a good way of learning how to do them.

> I must admit that I don\'t feel like I master the lmer model
> formulation, but I found a post by Harold Doran [1] where he
> explains the lmer syntax. My data file is structured the same as the
> one he models in fm3, fm4 and fm5. I have the following variables
> (of interest):

* cannabisUse_bi: a factor with two levels, \"0\" and \"1\". \'0\'
  indicates no cannabis use in the past week; \'1\' indicates cannabis
  use in the past week. This is the dependent variable (i.e. the
  criterion).
* moment: a factor with two levels, \'before\' and \'after\'
* id.factor: a factor with 444 levels, the identification of each
  participants (note that there are quite a lot of missing values,
  only about 276 cases without missings)
* school: a factor with 8 levels, each representing the school that
  the participants attend
* cannabisShow: a factor with 2 levels, \'control\' and
 \'intervention\' - this reflects whether a participant received the
 \'intervention\', aimed to decrease cannabis use, or
 not. Participants in five schools received the intervention;
 participants in three other schools didn\'t.

> Every person provided two datapoints (one before the intervention
> took place, and one after); there are several persons in a school;
> and there are several school in each condition (level) of
> cannabisShow.

> As far as I understand, this translates to \"Moment is nested within
> person (\'id.factor\'), which is nested within school, which is
> nested within cannabisShow\" (not sure about that last bit).

  Although others on this list disagree, I don't find "nesting" to be
very useful in the context of fixed effects, because the levels of
fixed effects almost always have identical meanings across different
levels of the random effect (i.e., "before" means the same for me as
for you)

 I would say the simplest sensible model would be

glmer(cannabisUse_bi ~ cannabisShow*moment + (1|school/id.factor), 
    family=binomial, data=dat.long)

which if your individuals are uniquely identified should be the same
as using (1|school) + (1|id.factor) as the random effects.

But I agree that you may very well want to try to take into account
whether the effects of the fixed effects differ among schools: you
might _like_ to see whether they differ among individuals as well, but
it is somewhere between impossible and very difficult to extract this
from binary data per individual (I'm sure you can't identify the
effects of cannabisShow, because each individual only gets one
intervention, and I'm pretty sure that you can't identify the effects
of before/after either, because all you have is binary data -- if you
had continuous data you *might* be able to detect variation in slope
among individuals, if it weren't confounded with residual error).

So I would try

glmer(cannabisUse_bi ~ cannabisShow*moment +
   (cannabisShow*moment|school) + (1|id.factor), family=binomial,
   data=dat.long)

(assuming that id.factor is unique across schools)

> Now, this model doesn\'t include the effect of the intervention, and
  if I include that, I get:

> rep_measures.new.model <- lmer(usedCannabis_bi ~ 1 + moment *
> cannabisShow + (moment|school/id.factor), family=binomial(link =
> \"logit\"), data=dat.long);

> If I compare these two models using Anova, the second one fits
> better (logLik from -182.02 to -166.68, ChiSq = 30.681, Df = 2, p =
> 2.177e-07). However, when you look at rep_measures.new.model, none
> of the fixed effects is significant. I may be completely wrong, but
> doesn\'t this mean that the cannabisShow variable, nor its
> interaction with measurement moment (i.e. \'time\'), contributes to
> explaining the dependent variable (i.e. cannabisUse_bi)?

  Maybe the before/after variation among schools (moment|school) is
  doing a lot?  Also, see my comment above about Wald tests.

> (in fact, I\'m also a bit confused as to the p-values that lmer
> provides for the fixed effects. I thought that there were good
> reasons not to - and that lmer wasn\'t supposed to? [3] (I don\'t
> understand the post - I\'m sadly not a statistician - but I thought
> I got the gist) Apparently this changed . . . ?)

  glmer provides likelihood ratio tests, which are good when the
sample size is large.  If you didn't have the school level I would say
not to worry about it, but 8 schools is not a large number ...

>  And now that I\'m mailing anyway: what is the difference between
> these two models?

> rep_measures.new.model.1 <- lmer(usedCannabis_bi ~ 1 + moment *
> cannabisShow + (moment|school/id.factor), family=binomial(link =
> \"logit\"), data=dat.long);

> rep_measures.new.model.2 <- lmer(usedCannabis_bi ~ 1 + moment *
> cannabisShow + (moment|id.factor:school), family=binomial(link =
> \"logit\"), data=dat.long);

> R gives slightly (but only slightly) different coefficient
> estimates; but on the first one, he seems to understand that school
> is a level (with 8 values), where for the second one, this is
> apparently not specified . . . What\'s the difference between the
> slash and the colon for indicating levels (the levels have to be
> \'the other way around\', apparently?)?

  The second leaves out the school effect, as specified above.

>  I\'m sorry to bother the list with such basic questions. I\'ve been
> looking for a tutorial or explanation, but I\'ve only been able to
> find little bits of information that I pieced together into my
> current (lack of ) understanding . . .

> Thank you in advance!
> 
> Gjalt-Jorn Peters

> PS: I\'ve put the R script at
> http://sciencerep.org/files/7/the%20cannabis%20show%20-%20analyses.r
> (the part I\'m talking about now starts after the line with \"######
> Behaviour\", line 195 - the real analyses I\'m talking about now
> start at line 314) This .R file downloads the data from
> http://sciencerep.org/files/7/the%20cannabis%20show%20-%20data.tsv

> The output you should get is at
> http://sciencerep.org/files/7/the%20cannabis%20show%20-%20output.txt
> (but the output file is kind of hard to interpret without the
> analyses file, as I didn\'t \"cat\" all comments)

> [1] http://tolstoy.newcastle.edu.au/R/e2/help/06/10/3345.html
> [2] http://www.rensenieuwenhuis.nl/r-sessions-17-generalized-multilevel-lme4/
>     http://www.talkstats.com/showthread.php/
    14393-Need-help-with-lmer-model-specification-syntax-for-nested-mixed-model
>     http://www.bodowinter.com/tutorial/bw_LME_tutorial.pdf
> [3] https://stat.ethz.ch/pipermail/r-help/2006-May/094765.html
> 
>