[R-sig-ME] pre/post with partial participation

Tue Nov 13 17:11:22 CET 2018

I have a crazy ANOVA question and would appreciate your advice.

We have a project that did a pre-post measurement, but the
participation in the data collection was haphazard.  There are only 19
people that participated pre-post, but there are about 40 that
participated only in the pre phase and 30 that participated in the
post phase.

I don't have the data to show you, but I made some up. I've got
ID=1:19 for the ones who are both in pre and post data samples, and ID
20:59 are in pre only and 60:89 are post only.

In my first example, the data has no true random effect and lmer gets
the correct estimate, the random effect is estimated as 0.00, or close
to it.  I find that slight differences in the way I generate the data
(either I get exactly 0.0 or 7 x 10^-13 or similar).

This way of making the data generates a "full pre/post" data set and
then throws away pre and post observations for the missing cases:

set.seed(234234)
dat4 <- data.frame(ID = rep(1:89, 2), x = gl(2, 89, labels = c("pre", "post")))
err <- rnorm(length(dat4$x), 0, 1)
b <- 0
beta <- 4
dat4$y <- ifelse(dat4$x == "pre", 40 + err, 40 + beta + err) + b
## Now omit the
## post measurement for ID 20:59
## pre measurement for ID 60:89
dat4 <- dat4[!(dat4$ID %in%  20:59 & dat4$x == "post"), ]
dat4 <- dat4[!(dat4$ID %in%  60:89 & dat4$x == "pre"), ]

library(lme4)

m1 <- lmer(y ~ x + (1 | ID), dat4)
summary(m1)

Output shows nearly 0 random ID variance:

Linear mixed model fit by REML ['lmerMod']
Formula: y ~ x + (1 | ID)
   Data: dat4

REML criterion at convergence: 313.3

Scaled residuals:
    Min      1Q  Median      3Q     Max
-2.4502 -0.6261 -0.0361  0.5753  3.5321

Random effects:
 Groups   Name        Variance  Std.Dev.
 ID       (Intercept) 7.342e-13 8.569e-07
 Residual             1.043e+00 1.021e+00
Number of obs: 108, groups:  ID, 89

Fixed effects:
            Estimate Std. Error t value
(Intercept)  39.8946     0.1330  299.99
xpost         4.2518     0.1974   21.54

I thought that was a happy result, the pre/post effect is estimated
reasonably and the estimator does not find a random effect if there is
none.

Then I put in a random effect.
set.seed(234234)
dat5 <- data.frame(ID = rep(1:89, 2), x = gl(2, 89, labels = c("pre", "post")))
err <- rnorm(length(dat5$x), 0, 1)
b <- rep(rnorm(89, 0, 1), 2)
beta <- 4
dat5$y <- ifelse(dat5$x == "pre", 40 + err, 40 + beta + err) + b
dat5 <- dat5[!(dat5$ID %in%  20:59 & dat5$x == "post"), ]
dat5 <- dat5[!(dat5$ID %in%  60:89 & dat5$x == "pre"), ]
m2 <- lmer(y ~ x + (1 | ID), dat5)
summary(m2)

> summary(m2)
Linear mixed model fit by REML ['lmerMod']
Formula: y ~ x + (1 | ID)
   Data: dat5

REML criterion at convergence: 367.1

Scaled residuals:
    Min      1Q  Median      3Q     Max
-1.8537 -0.3634 -0.0253  0.3611  2.1063

Random effects:
 Groups   Name        Variance Std.Dev.
 ID       (Intercept) 1.2441   1.1154
 Residual             0.6632   0.8144
Number of obs: 108, groups:  ID, 89

Fixed effects:
            Estimate Std. Error t value
(Intercept)  39.9205     0.1704  234.29
xpost         4.1333     0.2069   19.98

This "seems" to work reasonably.  What dangers await?

-- 
Paul E. Johnson   http://pj.freefaculty.org
Director, Center for Research Methods and Data Analysis http://crmda.ku.edu

To write to me directly, please address me at pauljohn at ku.edu.