[R-sig-ME] pre/post with partial participation

Tue Nov 13 17:34:15 CET 2018

Dear Paul,

If the errors are much larger than the random effect, then the random
effect might take some of the errors. Especially if both pre and post
observations have an error in the same direction.

Best regards,

ir. Thierry Onkelinx
Statisticus / Statistician

Vlaamse Overheid / Government of Flanders
INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND
FOREST
Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
thierry.onkelinx using inbo.be
Havenlaan 88 bus 73, 1000 Brussel
www.inbo.be

///////////////////////////////////////////////////////////////////////////////////////////
To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to say
what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey
///////////////////////////////////////////////////////////////////////////////////////////

<https://www.inbo.be>

Op di 13 nov. 2018 om 17:12 schreef Paul Johnson <pauljohn32 using gmail.com>:

> I have a crazy ANOVA question and would appreciate your advice.
>
> We have a project that did a pre-post measurement, but the
> participation in the data collection was haphazard.  There are only 19
> people that participated pre-post, but there are about 40 that
> participated only in the pre phase and 30 that participated in the
> post phase.
>
> I don't have the data to show you, but I made some up. I've got
> ID=1:19 for the ones who are both in pre and post data samples, and ID
> 20:59 are in pre only and 60:89 are post only.
>
> In my first example, the data has no true random effect and lmer gets
> the correct estimate, the random effect is estimated as 0.00, or close
> to it.  I find that slight differences in the way I generate the data
> (either I get exactly 0.0 or 7 x 10^-13 or similar).
>
> This way of making the data generates a "full pre/post" data set and
> then throws away pre and post observations for the missing cases:
>
> set.seed(234234)
> dat4 <- data.frame(ID = rep(1:89, 2), x = gl(2, 89, labels = c("pre",
> "post")))
> err <- rnorm(length(dat4$x), 0, 1)
> b <- 0
> beta <- 4
> dat4$y <- ifelse(dat4$x == "pre", 40 + err, 40 + beta + err) + b
> ## Now omit the
> ## post measurement for ID 20:59
> ## pre measurement for ID 60:89
> dat4 <- dat4[!(dat4$ID %in%  20:59 & dat4$x == "post"), ]
> dat4 <- dat4[!(dat4$ID %in%  60:89 & dat4$x == "pre"), ]
>
> library(lme4)
>
> m1 <- lmer(y ~ x + (1 | ID), dat4)
> summary(m1)
>
> Output shows nearly 0 random ID variance:
>
> Linear mixed model fit by REML ['lmerMod']
> Formula: y ~ x + (1 | ID)
>    Data: dat4
>
> REML criterion at convergence: 313.3
>
> Scaled residuals:
>     Min      1Q  Median      3Q     Max
> -2.4502 -0.6261 -0.0361  0.5753  3.5321
>
> Random effects:
>  Groups   Name        Variance  Std.Dev.
>  ID       (Intercept) 7.342e-13 8.569e-07
>  Residual             1.043e+00 1.021e+00
> Number of obs: 108, groups:  ID, 89
>
> Fixed effects:
>             Estimate Std. Error t value
> (Intercept)  39.8946     0.1330  299.99
> xpost         4.2518     0.1974   21.54
>
> I thought that was a happy result, the pre/post effect is estimated
> reasonably and the estimator does not find a random effect if there is
> none.
>
> Then I put in a random effect.
> set.seed(234234)
> dat5 <- data.frame(ID = rep(1:89, 2), x = gl(2, 89, labels = c("pre",
> "post")))
> err <- rnorm(length(dat5$x), 0, 1)
> b <- rep(rnorm(89, 0, 1), 2)
> beta <- 4
> dat5$y <- ifelse(dat5$x == "pre", 40 + err, 40 + beta + err) + b
> dat5 <- dat5[!(dat5$ID %in%  20:59 & dat5$x == "post"), ]
> dat5 <- dat5[!(dat5$ID %in%  60:89 & dat5$x == "pre"), ]
> m2 <- lmer(y ~ x + (1 | ID), dat5)
> summary(m2)
>
> > summary(m2)
> Linear mixed model fit by REML ['lmerMod']
> Formula: y ~ x + (1 | ID)
>    Data: dat5
>
> REML criterion at convergence: 367.1
>
> Scaled residuals:
>     Min      1Q  Median      3Q     Max
> -1.8537 -0.3634 -0.0253  0.3611  2.1063
>
> Random effects:
>  Groups   Name        Variance Std.Dev.
>  ID       (Intercept) 1.2441   1.1154
>  Residual             0.6632   0.8144
> Number of obs: 108, groups:  ID, 89
>
> Fixed effects:
>             Estimate Std. Error t value
> (Intercept)  39.9205     0.1704  234.29
> xpost         4.1333     0.2069   19.98
>
>
> This "seems" to work reasonably.  What dangers await?
>
>
> --
> Paul E. Johnson   http://pj.freefaculty.org
> Director, Center for Research Methods and Data Analysis
> http://crmda.ku.edu
>
> To write to me directly, please address me at pauljohn at ku.edu.
>
> _______________________________________________
> R-sig-mixed-models using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>

	[[alternative HTML version deleted]]