[R-sig-ME] Two-part question about inference and model structure
Dan
ieshan at gmail.com
Thu Nov 16 15:12:28 CET 2017
Hi all:
A two-part question about inference and model structure in repeated
measures designs with missing data.
Suppose data structured the following way, which mimics an experiment in
which participants are measured once per time (say, on days 1, 2, 3, 4) for
a particular dv we'll call "score". Gender is a between subjects effect.
Question 1:
Question 1 deals with model specification for a repeated design in which
participants are measured once at each time.
library(lmerTest)
library(car)
library(ez)
set.seed(20)
options(contrasts=c("contr.sum","contr.poly"))
data.in <- data.frame(PID = rep(seq(from = 1,
to = 20, by = 1), 4),
score = sample(x = 1:20,
size = 80,
replace = TRUE),
time = c(rep(1,20),rep(2,20),rep(3,20),rep(4,20)),
gender = rep(sample(x = 1:2,size=20,replace=TRUE),4)
)
data.in$time <- as.factor(data.in$time)
data.in$PID <- as.factor(data.in$PID)
data.in$gender <- as.factor(data.in$gender)
data.in$score <- data.in$score + ifelse(data.in$gender==1&data.in
$time==2,sample(x=13:18),0)
The model:
fit.lmer <- lmer(score~gender*time+(1|PID),data=data.in)
> summary(fit.lmer)Linear mixed model fit by REML
t-tests use Satterthwaite approximations to degrees of freedom ['lmerMod']
Formula: score ~ gender * time + (1 | PID)
Data: data.in
REML criterion at convergence: 493.3
Scaled residuals:
Min 1Q Median 3Q Max
-1.50266 -0.85711 0.04498 0.67803 1.87416
Random effects:
Groups Name Variance Std.Dev.
PID (Intercept) 5.399e-15 7.348e-08
Residual 3.676e+01 6.063e+00
Number of obs: 80, groups: PID, 20
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 11.4773 0.6813 72.0000 16.846 < 2e-16 ***
gender1 2.2727 0.6813 72.0000 3.336 0.00135 **
time1 -1.1944 1.1801 72.0000 -1.012 0.31484
time2 5.6641 1.1801 72.0000 4.800 8.36e-06 ***
time3 -2.5480 1.1801 72.0000 -2.159 0.03417 *
gender1:time1 -1.4444 1.1801 72.0000 -1.224 0.22493
gender1:time2 5.1414 1.1801 72.0000 4.357 4.30e-05 ***
gender1:time3 -1.9798 1.1801 72.0000 -1.678 0.09774 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) gendr1 time1 time2 time3 gnd1:1 gnd1:2
gender1 0.100
time1 0.000 0.000
time2 0.000 0.000 -0.333
time3 0.000 0.000 -0.333 -0.333
gender1:tm1 0.000 0.000 0.100 -0.033 -0.033
gender1:tm2 0.000 0.000 -0.033 0.100 -0.033 -0.333
gender1:tm3 0.000 0.000 -0.033 -0.033 0.100 -0.333 -0.333
It does not seem possible in this case to fit a model with random slopes, e.g.:
fit.lmer <- lmer(score~gender*time+(time|PID),data=data.in)
Because R returns an error in that case, as time is treated here as a
factor variable:
Error: number of observations (=80) <= number of random effects (=80)
for term (time | PID); the random-effects parameters and the residual
variance (or scale parameter) are probably unidentifiable
To test for significance here, one can do:
anova(fit.lmer,type=3,ddf="Kenward-Roger")
> anova(fit.lmer,type=3,ddf="Kenward-Roger")Analysis of Variance Table of type III with Kenward-Roger
approximation for degrees of freedom
Sum Sq Mean Sq NumDF DenDF F.value Pr(>F)
gender 409.09 409.09 1 18 11.1275 0.0036787 **
time 865.15 288.38 3 54 7.8442 0.0001952 ***
gender:time 700.70 233.57 3 54 6.3531 0.0009083 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
or
> Anova(fit.lmer,type=3,test="F")Analysis of Deviance Table (Type III Wald F tests with Kenward-Roger df)
Response: score
F Df Df.res Pr(>F)
(Intercept) 283.7785 1 18 1.822e-12 ***
gender 11.1275 1 18 0.0036787 **
time 7.8442 3 54 0.0001952 ***
gender:time 6.3531 3 54 0.0009083 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Which both return the same values. [Yes, there are theoretical /
statistical questions here about using Type 3 vs. Type 2 SS, but that is
not the larger question].
These values are extremely close to the values reported by a standard
'Design of Experiments' factorial analysis, given by ezANOVA:
fit.ez <- ezANOVA(data.in
,dv=score
,wid=PID
,within=time
,between=gender,
type=3,detailed=TRUE)
print(fit.ez)
> print(fit.ez)$ANOVA
Effect DFn DFd SSn SSd F p
p<.05 ges
1 (Intercept) 1 18 10432.8409 598.4091 313.817319 7.735877e-13
* 0.7976269
2 gender 1 18 409.0909 598.4091 12.305355 2.512273e-03
* 0.1338604
3 time 3 54 865.1490 2048.6010 7.601618 2.493967e-04
* 0.2463297
4 gender:time 3 54 700.6990 2048.6010 6.156680 1.118784e-03
* 0.2093070
i.e., the F values are very close.
So, question 1: In this case, where there is one observation per subject
with time as a factor, it seems appropriate/valid to specify the mixed
effects model in this way - i.e., without a Time|Subject slope, and just
using the specification above. Correct?
Question 2:
I had previously posed this question to a few others, but wanted to include
it in this thread because it deals with the same model structure.
Suppose there are some missing data in this experiment, e.g.:
data.in.missing <- data.in[sample(80,60,replace=FALSE),]
[Yes, I know that's a lot of missing data, just using for example purposes.]
That model gives:
> summary(fit.lmer.missing)Linear mixed model fit by REML
t-tests use Satterthwaite approximations to degrees of freedom ['lmerMod']
Formula: score ~ gender * time + (1 | PID)
Data: data.in.missing
REML criterion at convergence: 358
Scaled residuals:
Min 1Q Median 3Q Max
-1.73735 -0.86331 -0.01072 0.69862 1.92563
Random effects:
Groups Name Variance Std.Dev.
PID (Intercept) 5.327e-13 7.299e-07
Residual 3.396e+01 5.828e+00
Number of obs: 60, groups: PID, 20
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 12.5771 0.7629 52.0000 16.485 < 2e-16 ***
gender1 2.3916 0.7629 52.0000 3.135 0.002827 **
time1 -0.7200 1.3398 52.0000 -0.537 0.593292
time2 4.8118 1.3794 52.0000 3.488 0.000999 ***
time3 -2.3896 1.2820 52.0000 -1.864 0.067966 .
gender1:time1 -1.2488 1.3398 52.0000 -0.932 0.355617
gender1:time2 7.2195 1.3794 52.0000 5.234 3.03e-06 ***
gender1:time3 -3.3291 1.2820 52.0000 -2.597 0.012204 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) gendr1 time1 time2 time3 gnd1:1 gnd1:2
gender1 0.081
time1 0.024 -0.046
time2 0.074 0.134 -0.371
time3 -0.053 -0.048 -0.323 -0.344
gender1:tm1 -0.046 0.024 0.026 -0.077 0.027
gender1:tm2 0.134 0.074 -0.077 0.223 -0.080 -0.371
gender1:tm3 -0.048 -0.053 0.027 -0.080 0.029 -0.323 -0.344
With the following ANOVA:
> anova(fit.lmer.missing,type=3)Analysis of Variance Table of type III with Satterthwaite
approximation for degrees of freedom
Sum Sq Mean Sq NumDF DenDF F.value Pr(>F)
gender 333.75 333.75 1 52 9.8268 0.002827 **
time 439.48 146.49 3 52 4.3132 0.008630 **
gender:time 974.01 324.67 3 52 9.5594 3.944e-05 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Supposing there are not particular combinations of gender and time
missing and supposing the data are not missing for reasons that would
have affected the dv in some serious way (i.e., lets assume for
argument that the data are missing completely at random).
Question 2: My understanding is that the inference is still valid in this
case (but, caveated above and also by the poorer estimation with missing
values). Can folks confirm that is true?
Apologies if I have left out something critical needed to run these
examples. Just wrote some pseudodata.
Best-
Dan
[[alternative HTML version deleted]]
More information about the R-sig-mixed-models
mailing list