[R-sig-ME] EM and Missing Data in R

Thu Jan 22 15:21:11 CET 2009

Kyle:

I realize in the HLM circles it is common to use the term "levels", but
this is really quite confusing and, in fact, misleading. In a multilevel
model, there are multiple levels of random variation (many variance
components) but there are not multiple levels of fixed effects. These
are linear models with additive fixed and random effects. That is, there
are random effects and everything else is just a covariate---there are
no levels associated with covariates. So, now let's consider your
question. 

The matrix notation of the model is Y = XB + Zu + e where X is a known
model matrix, B are the coefficients of the fixed effects, Z is also a
model matrix and u are the random effects.

Now, u is completely missing (as is B). If they weren't missing, the
problem of solving for B would be easy. That is, if we had the complete
data, the maximization problem is simple. But, this is a missing data
problem and so some process is necessary to help us along. That is what
EM does. It can be used to augment the missing data in the vector u to
form a complete data problem and subsequently then perform the
maximization w.r.t B.

So yes, EM is a useful tool for missing data problems. EM, I think, is
easily programable in R. But, because EM is a general algorithm, I think
the best path for you is to go to the Dempster et al paper to understand
how it works and how it can be applied to missing data problems. Then,
you need to consider how it will work with your specific problem and
work out the conditional expectations and then maximize (which is often
the easiest part).

HTH,
Harold

> -----Original Message-----
> From: r-sig-mixed-models-bounces at r-project.org 
> [mailto:r-sig-mixed-models-bounces at r-project.org] On Behalf 
> Of Roberts, Kyle
> Sent: Wednesday, January 21, 2009 10:56 AM
> To: r-sig-mixed-models at r-project.org
> Subject: [R-sig-ME] EM and Missing Data in R
> 
> Friends,
> 
> Do you know of any way to use the EM algorithm to do 
> imputation for missing data at the second level in R? I have 
> heard some things about this at conferences, but can't put my 
> fingers on the actual references. I have a student who is 
> looking at missing data treatments for level-2 variables. I 
> haven't done any research in this area, but I want to point 
> her in the right direction.
> 
> If this is an "nonsensical"-type question, please forgive my naivety!
> 
> Thanks for your instruction.
> 
> Blessings,
> Kyle
> 
> *********************************************************
> Dr. J. Kyle Roberts
> Department of Teaching and Learning
> Annette Caldwell Simmons School of Education
>    and Human Development
> Southern Methodist University
> P.O. Box 750381
> Dallas, TX  75275
> 214-768-4494
> http://www.hlm-online.com/
> *********************************************************
> 
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list 
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>