[R-sig-ME] Differences in degrees of freedom between a mixed-effects model and a gls model using nlme

Mon Feb 9 11:24:26 CET 2015

I don't want to derail this thread entirely, but it does make me wonder: Are people really concerned about calculating the "right" degrees of freedom in their applications anyway? I have pretty much stopped worrying about the software cleverly figuring out what the right dfs are, as I hardly ever deal with situations where there is a clear and correct answer to that question -- even in the designed experiments I see, unbalancedness creeps in in various ways, the most obvious one being missing data due to attrition (in Karl's example, there is of course a clear answer, but my question is more general).

I am sure that the type of applications one deals with has an influence on this matter. If you see nicely designed experiments with balanced data, getting the dfs right might seem like an important concern. Or if sample sizes are small (as in the number of individuals and/or number of repeated measurements), then it may matter whether the dfs are 10 or 100 for the conclusions you draw from a test (which, in the end, is then based, at least partly, on the p-value the software throws at you). But as far as I am concerned, I constantly (and grudgingly, with a lot of wishful thinking) need to rely on the asymptotic behavior of the estimates, standard errors, and test statistics every which way I turn anyway. Whether the dfs are 10, 40.5682..., or 100 is one of my least pressing concerns. If the conclusion doesn't pass the interocular traumatization test, I don't have much faith in it anyway.

I know that this has come up before, http://glmm.wikidot.com/faq discusses this as well, and the fact that lme4 doesn't provide p-values is, in essence, a statement in the same direction, but I am just curious about other people's opinion on this.

Best,
Wolfgang

--   
Wolfgang Viechtbauer, Ph.D., Statistician   
Department of Psychiatry and Psychology   
School for Mental Health and Neuroscience   
Faculty of Health, Medicine, and Life Sciences   
Maastricht University, P.O. Box 616 (VIJV1)   
6200 MD Maastricht, The Netherlands   
+31 (43) 388-4170 | http://www.wvbauer.com   

> -----Original Message-----
> From: R-sig-mixed-models [mailto:r-sig-mixed-models-bounces at r-
> project.org] On Behalf Of Ben Bolker
> Sent: Monday, February 09, 2015 05:36
> To: r-sig-mixed-models at r-project.org
> Subject: Re: [R-sig-ME] Differences in degrees of freedom between a
> mixed-effects model and a gls model using nlme
> 
> Ken Beath <ken.beath at ...> writes:
> 
> >
> > All 3 (paired t-test, mixed effect and gls with compound symmetry) are
> > fitting the same model, and so should give the same result. That is
> what
> > you see with the first example. The gls model is not getting it wrong
> > except for the df.
> >
> > For the second the 3 model results should again be the same. I'm not
> > certain why but it may be numerical. Even though the data come
> > from a model
> > that isn't correct for the fitting that should be irrelevant, it is the
> > data that produce the model fit not the model that produces the data.
> > Possibly estimates of the correlation are poor when there is little
> > correlation, and that flows through to the mixed effects and
> > gls results.
> >
> > The relationship to the unpaired t-test is probably irrelevant. Note
> also
> > that the default for the t.test is unequal variances whereas for a
> mixed
> > model it is equal variances.
> >
> > The df for gls is obviously in a sense a bug. Getting the df for a
> mixed
> > model isn't easy. Here we have a nice simple correlation structure and
> > there is an obvious correct answer, but usually there isn't one. If the
> > model assumed uncorrelated data then the gls df would be correct, so it
> is
> > necessary for the software to work out what is going on. Using
> parametric
> > bootstrapping to determine the underlying distribution seems a better
> > method if accuracy is important.
> >
> > Ken
> >
> 
>   For what it's worth you can easily see what gls() is doing to
> get its df, and confirm that it's naive, by printing nlme:::summary.gls:
> 
>   tTable[, "p-value"] <- 2 * pt(-abs(tTable[, "t-value"]),
>         dims$N - dims$p)
> 
> For what it's worth, I've found that the df calculations used by
> lme() often fail quite badly for random-slopes models ... it's often
> really hard to guess, even for simpler designs (i.e. where there
> really is a precise correspondence with an F distribution -- no
> correlation
> structures or lack of balance or crossed random effects).
> 
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models