[R-sig-ME] Two-part question about inference and model structure

Thu Nov 16 15:12:28 CET 2017

Hi all:

A two-part question about inference and model structure in repeated
measures designs with missing data.

Suppose data structured the following way, which mimics an experiment in
which participants are measured once per time (say, on days 1, 2, 3, 4) for
a particular dv we'll call "score". Gender is a between subjects effect.

Question 1:
Question 1 deals with model specification for a repeated design in which
participants are measured once at each time.

library(lmerTest)
library(car)
library(ez)

set.seed(20)

options(contrasts=c("contr.sum","contr.poly"))

data.in <- data.frame(PID = rep(seq(from = 1,
                                   to = 20, by = 1), 4),
                     score = sample(x = 1:20,
                                     size = 80,
                                     replace = TRUE),
                     time = c(rep(1,20),rep(2,20),rep(3,20),rep(4,20)),
                     gender = rep(sample(x = 1:2,size=20,replace=TRUE),4)
)

data.in$time <- as.factor(data.in$time)
data.in$PID <- as.factor(data.in$PID)
data.in$gender <- as.factor(data.in$gender)
data.in$score <- data.in$score + ifelse(data.in$gender==1&data.in
$time==2,sample(x=13:18),0)

The model:

fit.lmer <- lmer(score~gender*time+(1|PID),data=data.in)

> summary(fit.lmer)Linear mixed model fit by REML
t-tests use  Satterthwaite approximations to degrees of freedom ['lmerMod']
Formula: score ~ gender * time + (1 | PID)
   Data: data.in

REML criterion at convergence: 493.3

Scaled residuals:
     Min       1Q   Median       3Q      Max
-1.50266 -0.85711  0.04498  0.67803  1.87416

Random effects:
 Groups   Name        Variance  Std.Dev.
 PID      (Intercept) 5.399e-15 7.348e-08
 Residual             3.676e+01 6.063e+00
Number of obs: 80, groups:  PID, 20

Fixed effects:
              Estimate Std. Error      df t value Pr(>|t|)
(Intercept)    11.4773     0.6813 72.0000  16.846  < 2e-16 ***
gender1         2.2727     0.6813 72.0000   3.336  0.00135 **
time1          -1.1944     1.1801 72.0000  -1.012  0.31484
time2           5.6641     1.1801 72.0000   4.800 8.36e-06 ***
time3          -2.5480     1.1801 72.0000  -2.159  0.03417 *
gender1:time1  -1.4444     1.1801 72.0000  -1.224  0.22493
gender1:time2   5.1414     1.1801 72.0000   4.357 4.30e-05 ***
gender1:time3  -1.9798     1.1801 72.0000  -1.678  0.09774 .
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Correlation of Fixed Effects:
            (Intr) gendr1 time1  time2  time3  gnd1:1 gnd1:2
gender1      0.100
time1        0.000  0.000
time2        0.000  0.000 -0.333
time3        0.000  0.000 -0.333 -0.333
gender1:tm1  0.000  0.000  0.100 -0.033 -0.033
gender1:tm2  0.000  0.000 -0.033  0.100 -0.033 -0.333
gender1:tm3  0.000  0.000 -0.033 -0.033  0.100 -0.333 -0.333

It does not seem possible in this case to fit a model with random slopes, e.g.:
fit.lmer <- lmer(score~gender*time+(time|PID),data=data.in)

Because R returns an error in that case, as time is treated here as a
factor variable:

Error: number of observations (=80) <= number of random effects (=80)
for term (time | PID); the random-effects parameters and the residual
variance (or scale parameter) are probably unidentifiable

To test for significance here, one can do:

anova(fit.lmer,type=3,ddf="Kenward-Roger")

> anova(fit.lmer,type=3,ddf="Kenward-Roger")Analysis of Variance Table of type III  with  Kenward-Roger
approximation for degrees of freedom
            Sum Sq Mean Sq NumDF DenDF F.value    Pr(>F)
gender      409.09  409.09     1    18 11.1275 0.0036787 **
time        865.15  288.38     3    54  7.8442 0.0001952 ***
gender:time 700.70  233.57     3    54  6.3531 0.0009083 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

or

> Anova(fit.lmer,type=3,test="F")Analysis of Deviance Table (Type III Wald F tests with Kenward-Roger df)

Response: score
                   F Df Df.res    Pr(>F)
(Intercept) 283.7785  1     18 1.822e-12 ***
gender       11.1275  1     18 0.0036787 **
time          7.8442  3     54 0.0001952 ***
gender:time   6.3531  3     54 0.0009083 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Which both return the same values. [Yes, there are theoretical /
statistical questions here about using Type 3 vs. Type 2 SS, but that is
not the larger question].

These values are extremely close to the values reported by a standard
'Design of Experiments' factorial analysis, given by ezANOVA:

fit.ez <- ezANOVA(data.in
        ,dv=score
        ,wid=PID
        ,within=time
        ,between=gender,
        type=3,detailed=TRUE)
print(fit.ez)

> print(fit.ez)$ANOVA
       Effect DFn DFd        SSn       SSd          F            p
p<.05       ges
1 (Intercept)   1  18 10432.8409  598.4091 313.817319 7.735877e-13
* 0.7976269
2      gender   1  18   409.0909  598.4091  12.305355 2.512273e-03
* 0.1338604
3        time   3  54   865.1490 2048.6010   7.601618 2.493967e-04
* 0.2463297
4 gender:time   3  54   700.6990 2048.6010   6.156680 1.118784e-03
* 0.2093070

i.e., the F values are very close.

So, question 1: In this case, where there is one observation per subject
with time as a factor, it seems appropriate/valid to specify the mixed
effects model in this way - i.e., without a Time|Subject slope, and just
using the specification above. Correct?

Question 2:
I had previously posed this question to a few others, but wanted to include
it in this thread because it deals with the same model structure.

Suppose there are some missing data in this experiment, e.g.:
data.in.missing <- data.in[sample(80,60,replace=FALSE),]

[Yes, I know that's a lot of missing data, just using for example purposes.]

That model gives:

> summary(fit.lmer.missing)Linear mixed model fit by REML
t-tests use  Satterthwaite approximations to degrees of freedom ['lmerMod']
Formula: score ~ gender * time + (1 | PID)
   Data: data.in.missing

REML criterion at convergence: 358

Scaled residuals:
     Min       1Q   Median       3Q      Max
-1.73735 -0.86331 -0.01072  0.69862  1.92563

Random effects:
 Groups   Name        Variance  Std.Dev.
 PID      (Intercept) 5.327e-13 7.299e-07
 Residual             3.396e+01 5.828e+00
Number of obs: 60, groups:  PID, 20

Fixed effects:
              Estimate Std. Error      df t value Pr(>|t|)
(Intercept)    12.5771     0.7629 52.0000  16.485  < 2e-16 ***
gender1         2.3916     0.7629 52.0000   3.135 0.002827 **
time1          -0.7200     1.3398 52.0000  -0.537 0.593292
time2           4.8118     1.3794 52.0000   3.488 0.000999 ***
time3          -2.3896     1.2820 52.0000  -1.864 0.067966 .
gender1:time1  -1.2488     1.3398 52.0000  -0.932 0.355617
gender1:time2   7.2195     1.3794 52.0000   5.234 3.03e-06 ***
gender1:time3  -3.3291     1.2820 52.0000  -2.597 0.012204 *
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Correlation of Fixed Effects:
            (Intr) gendr1 time1  time2  time3  gnd1:1 gnd1:2
gender1      0.081
time1        0.024 -0.046
time2        0.074  0.134 -0.371
time3       -0.053 -0.048 -0.323 -0.344
gender1:tm1 -0.046  0.024  0.026 -0.077  0.027
gender1:tm2  0.134  0.074 -0.077  0.223 -0.080 -0.371
gender1:tm3 -0.048 -0.053  0.027 -0.080  0.029 -0.323 -0.344

With the following ANOVA:

> anova(fit.lmer.missing,type=3)Analysis of Variance Table of type III  with  Satterthwaite
approximation for degrees of freedom
            Sum Sq Mean Sq NumDF DenDF F.value    Pr(>F)
gender      333.75  333.75     1    52  9.8268  0.002827 **
time        439.48  146.49     3    52  4.3132  0.008630 **
gender:time 974.01  324.67     3    52  9.5594 3.944e-05 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Supposing there are not particular combinations of gender and time
missing and supposing the data are not missing for reasons that would
have affected the dv in some serious way (i.e., lets assume for
argument that the data are missing completely at random).

Question 2: My understanding is that the inference is still valid in this
case (but, caveated above and also by the poorer estimation with missing
values). Can folks confirm that is true?

Apologies if I have left out something critical needed to run these
examples. Just wrote some pseudodata.

Best-
Dan

	[[alternative HTML version deleted]]