[R-sig-ME] Differences in degrees of freedom between a mixed-effects model and a gls model using nlme
Hufthammer, Karl Ove
karl.ove.hufthammer at helse-bergen.no
Fri Feb 6 09:34:07 CET 2015
Dear list members,
I'm having some problems understanding the difference in degrees of freedom between a mixed-effect model and a gls model, both fitted using the nlme package. I'm used to it being difficult to figure out the 'correct' (if such a thing exists) number of degrees of freedom in mixed-effects models, but in the following very simple example the *mixed-effects model* seems to use the correct degrees of freedom, while the gls model uses too many degrees of freedom.
Here's the example. Basically, I want to test if a mixed-effects model is equivalent to a paired t-test.
# Generate some correlated data
set.seed(1)
n = 10
u = 5 + rnorm(n, sd=1)
x1 = u + rnorm(n)
x2 = u + rnorm(n) + 1
# A paired t-test is, as expected, more powerful than an unpaired one
t.test(x1, x2) # p = .1407
t.test(x1, x2, paired=TRUE) # p = .0597
The relevant line of output for the paired test is:
t = -2.1532, df = 9, p-value = 0.05972
# Fit a linear mixed-effects model
library(nlme)
l = lme(values~ind, random=~1|id, data=df)
summary(l)
The relevant lines of output are:
Value Std.Error DF t-value p-value
indx2 0.617482 0.2867705 9 2.153226 0.0597
So we get the same t-value (to five decimal places - it differs in the last decimal), degrees of freedom and p-value as in the paired t-test. Let's now try to fit a gls model for correlated data. This should in theory be equivalent to doing a paired t-test. We get:
# Fit a correlated gls model
library(nlme)
l2 = gls(values~ind, correlation=corSymm(form=~1|id), data=df)
summary(l2)
Value Std.Error t-value p-value
indx2 0.617482 0.2867703 2.153228 0.0451
The t-value is the same as in the t-test (here to *six* decimal place), but the p-value is too small. The reason is that the test uses twice the degrees of freedom that it should. It uses 18 degrees of freedom:
2*pt(2.153228, 18, lower.tail = FALSE) # 0.04510817
So the mixed-effects model gets it right, while the gls model gets it wrong. But it's actually the *gls* model that should be equivalent to the paired t-test, not the mixed-effects model. We can easily see this by changing the correlation to be negative:
set.seed(2)
n = 10
u = 5 + rnorm(n, sd=1)
x1 = u + rnorm(n)
x2 = 15 - 2*u + rnorm(n) + 1
cor(x1, x2) # -0.50
Unpaired t-test:
t = -0.5052, df = 15.753, p-value = 0.6204
Paired t-test (note that the p-value is *greater* than for the unpaired case):
t = -0.418, df = 9, p-value = 0.6857
Mixed-effects model:
Value Std.Error DF t-value p-value
indx2 0.481868 0.9537336 9 0.505244 0.6255
Of course, the mixed-effects model doesn't really fit the data. The random effect variance is estimated to be ~0, so basically this corresponds to an *unpaired* t-test (but one that assumes equal variance). Note that the t-statistic is equal to the t-statistic of the unpaired t-test (though the latter uses 18 degrees of freedom, while this model uses 9 degrees of freedom).
Now, if I've understood everything correctly, the gls model should *exactly* correspond to a paired t-test; it's the same underlying model. And indeed we get the same t-statistic (0.418):
Value Std.Error t-value p-value
indx2 0.481868 1.1526947 0.418036 0.6809
But the p-value differs somewhat, since the gls model assumes 18 degrees of freedom, while the t-test (correctly) assumes 9 degrees of freedom.
At least for positive correlations, a mixed-effect random intercept model and a gls model with compound symmetry should be *almost* equivalent, so I see no reason that the gls model can assume that the test statistics has twice as many degrees of freedom as in the mixed-effects model. So why the discrepancy? Shouldn't I trust the degrees of freedom calculations from gls()?
--
Karl Ove Hufthammer
More information about the R-sig-mixed-models
mailing list