# [R-sig-ME] Handling missing data through Gaussian mixture model

jas ni jasnie111 at gmail.com
Fri Sep 9 18:47:22 CEST 2016

```Dear all,

I'm currently trying to write the missing data imputation code through
Gaussian mixture model. My current reference as provided in the attachment
or you can reach at this url:
http://www.sciencedirect.com/science/article/pii/S0167947306003707

This is the code for e-step and m-step through the Gaussian mixture model:

data(faithful);
library(mvtnorm); # using multivariate normal package (need to install this
but easy .... )
N = dim(faithful); # number of sample points
X = faithful[,1:2]; # the data matrix
alpha1 = alpha2 = .5; # these are our initial class probability
m1 = c(2,90);   # initial means, chosen to be bad
m2 = c(4,50);
Sigma1 = Sigma2 = diag(c(var(X[,1]),var(X[,2])));   # initial covariance
matrices computed from entire data

e_step<-function(x,m1,m2,Sigma1,Sigma2,alpha){

x_missing<-x
x_missing\$eruptions[1:10] <- NA
x_missing\$waiting[10:20] <- NA
x_missing\$eruptions[is.na(x_missing\$eruptions)] =
mean(x_missing\$eruptions, na.rm=TRUE)
x_missing\$waiting[is.na(x_missing\$waiting)] = mean(x_missing\$waiting,
na.rm=TRUE)
x<-x_missing

comp1.prod <- dmvnorm(x, m1, Sigma1) * alpha
comp2.prod <- dmvnorm(x, m2, Sigma2) * alpha
sum.of.comps <- comp1.prod + comp2.prod
comp1.post <- comp1.prod / sum.of.comps
comp2.post <- comp2.prod / sum.of.comps
ll = sum(log(sum.of.comps))

list("ll" = ll,
"posterior.df" = cbind(comp1.post, comp2.post))
}

m_step<-function(x,posterior.df){
comp1.n <- sum(posterior.df[, 1])
comp2.n <- sum(posterior.df[, 2])

comp1.alpha <- comp1.n / length(x)
comp2.alpha <- comp2.n / length(x)

comp1.mu <- colSums(posterior.df[, 1]*X)/comp1.n
comp2.mu <- colSums(posterior.df[, 2]*X)/comp2.n

resp1=posterior.df[, 1]
resp2=posterior.df[, 2]
acc1 = acc2 = matrix(0,nrow=2,ncol=2);
Y = as.matrix(x);
for (n in 1:N) {
acc1 = acc1 +  resp1[n] * ((Y[n,] - m1)  %*% t(Y[n,]-m1));
acc2 = acc2 + resp2[n] * ((Y[n,] - m2)  %*% t(Y[n,]-m2));
}
Sigma1 = acc1/sum(resp1); Sigma2 = acc2/sum(resp1);

list("m1" = comp1.mu,
"m2"   = comp2.mu,
"Sigma1" = Sigma1,
"Sigma2" = Sigma2,
"alpha" = c(comp1.alpha, comp2.alpha))
}

I have noticed that mean imputation part in the e-step is not in right way.
Therefore i want to apply the random draw imputation as mentioned in the
paper at page number 5308.

Can anybody here guide me on how to write the random draw imputation in R?
Thank you in advance
-Jas

[[alternative HTML version deleted]]

```