[R] Missing data and LME models and diagnostic plots
Kingsford Jones
kingsfordjones at gmail.com
Thu Oct 22 05:29:57 CEST 2009
Mixed models based on likelihood methods can often handle missing
observations within subjects, but they not do well with missing
individual elements in the design matrices (think unit nonresponse vs
item nonresponse in the survey world). Continuing with the example I
recently sent to you
set.seed(0112358)
library(nlme)
school <- factor(rep(1:20, each=4))
year <- rep(2006:2009, 20)
year.c <- year - mean(year)
tmt <- sample(0:1, 20, replace = TRUE)[school]
math <- rnorm(80, 2 + tmt + .001*year + .0001*tmt*year, 1.5) + rnorm(20)[school]
tmt <- factor(tmt)
dfr <- data.frame(math, school, tmt, year, year.c)
rm(math, school, year, year.c, tmt)
# Now create missing observations,
# -- note just a single obs for the first school:
dfr2 <- dfr[-c(2:6, sample(75, 5)), ]
table(dfr2$school)
#
# school: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
# numobs: 1 2 4 3 4 4 4 4 4 3 3 4 4 4 3 4 4 4 4 4
f3 <- lme(math ~ year.c*tmt, data = dfr2, random=~1|school)
intervals(f3)
#Approximate 95% confidence intervals
#
# Fixed effects:
# lower est. upper
#(Intercept) 3.0705975 3.81227729 4.5539571
#year.c -0.3913985 0.09107008 0.5735386
#tmt1 0.3146311 1.34895504 2.3832790
#year.c:tmt1 -0.4385680 0.20748194 0.8535319
#attr(,"label")
#[1] "Fixed effects:"
#
# Random Effects:
# Level: school
# lower est. upper
#sd((Intercept)) 0.35068 0.7334782 1.534134
#
# Within-group standard error:
# lower est. upper
#1.226240 1.492044 1.815464
Note that even with the missing obs, in this instance we got pretty
good estimates of the variance components (parameters: within-group sd
= 1.5; between-school sd = 1). However, I had to change the seed from
the one I used in the last email because, as is not uncommon with
unbalanced data and relatively small sample sizes, there were
convergence problems. If you were using classical MoM estimators
fitting a mixed model with unbalanced data would lead to very
precarious inferences (not that the likelihood inference isn't without
it share of pitfalls, but that's another email...)
hth,
Kingsford
On Wed, Oct 21, 2009 at 11:13 AM, Peter Flom
<peterflomconsulting at mindspring.com> wrote:
> Hello
>
> Running R2.9.2 on Windows XP
>
> I am puzzled by the performance of LME in situations where there are missing data. As I understand it, one of the strengths of this sort of model is how well it deals with missing data, yet lme requires nonmissing data.
>
> Thus,
>
> m1.mod1 <- lme(fixed = math_1 ~ I(year-2007.5)*TFC_,
> data = long,
> random = ~I(year-2007.5)|schoolnum)
>
> causes an error in na.fail.default, but adding na.action = "na.omit" makes a model with no errors. However, if I create that model, i.e.
>
> m1.mod1 <- lme(fixed = math_1 ~ I(year-2007.5)*TFC_,
> data = long,
> random = ~I(year-2007.5)|schoolnum,
> na.action = "na.omit")
>
> then the diagnostic plots suggested in Pinheiro & Bates produce errors; e.g.
>
> plot(m1.mod1, schoolnum~resid(.), abline = 0)
>
> gives an error "could not find function "NaAct".
>
>
> Searching the archives showed a similar question from 2007, but did not show any responses.
>
> Thanks for any help
>
> Peter
>
> )
>
> Peter L. Flom, PhD
> Statistical Consultant
> Website: www DOT peterflomconsulting DOT com
> Writing; http://www.associatedcontent.com/user/582880/peter_flom.html
> Twitter: @peterflom
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list