[R] imputation in mice

Sat Dec 8 21:45:42 CET 2012

What do 

> str(data)
> summary(data)

show you? The str() function will show you what kind of variables you have
and the summary() command will indicate the range of the values and if there
are missing data. 

You seem to be overwriting your original data frame "data" (really a bad
name to use since data() is a function in R) after the imputation. Your code
does not show us where "data" comes from originally. The "weight" variable
also seems to exist in something called "lbdata." The error message suggests
that what is in "data" when you try to compute your propensity scores is not
what you think it is.

----------------------------------------------
David L Carlson
Associate Professor of Anthropology
Texas A&M University
College Station, TX 77843-4352

> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of Elizabeth Fuller Bettini
> Sent: Friday, December 07, 2012 10:55 PM
> To: r-help at r-project.org
> Subject: [R] imputation in mice
> 
> Hello!  If I understand this listserve correctly, I can email this
> address
> to get help when I am struggling with code.  If this is inaccurate,
> please
> let me know, and I will unsubscribe.
> I have been struggling with the same error message for a while, and I
> can't
> seem to get past it.
> Here is the issue:
> I am using a data set that uses -1:-9 to indicate various kinds of
> missing
> data.  I changed all of these to NA, regardless of the cause of the
> missing
> data. I am trying to do propensity score matching with this data, but
> it
> will not calculate the propensity scores, regardless of which method I
> have
> tried. I have tried the following methods:
> 1. Optimal propensity score matching, using the MatchIt library:
> m.out<-matchit(assignment~totalexp + yrschool+new+cert+age+STratio +
> percminority+urbanicity+povproblem+numthreats+numbattack+weight, data =
> data, distance="logit", method = "optimal", ratio = 1)
> 2. Nearest neighbor propensity score matching, using the MatchIt
> library:
> mout<-matchit(assignment~totalexp +
> yrschool+new+cert+age+STratio+percminority+urbanicity+povproblem+numthr
> eats+numbattack,
> distance = "logit", replace = T, data = data, method = "nearest",
> m.order="largest", caliper = 0.10)
> 3. Just calculating the propensity scores using the glm function:
> ps.model = glm(assignment~totalexp +
> yrschool+new+cert+age+STratio+percminority+urbanicity+povproblem+numthr
> eats+numbattack,
> family = "binomial", data = data)
> data$propensityscores = fitted(ps.model)
> 
> In each case, I have tried running the code after having performed zero
> imputations, 1 imputation, and 5 imputations.  A colleague looked at my
> code and assured me that I was doing the imputations correctly.
> However,
> even after performing the imputation, one of the continuous variables
> still
> has NAs.  This is the code that I am using for 5 imputations:
> library(mice)
> #Remove weights
> data$weight<-NULL
> #perform the imputation
> imputed.data = mice(data,  m = 5, diagnostics = F)
> #reinsert the weights
> imputed.data.final=complete(imputed.data)
> imputed.data.final$weight=lbdata$weight
> #rename the imputed dataset "data"
> data = imputed.data.final
> 
> When I perform optimal propensity score matching or nearest neighbor
> matching (regardless of how many imputations I perform), I get the
> following error:
> Error in matchit(assignment ~ totalexp + yrschool + new + cert + age +
> :
> Missing values exist in the data
> I tried running these with just two of the categorical covariates, but
> I
> still got this error, even though there is no missing data for those
> variables.
> 
> When I perform the glm function to get the propensity scores, I get
> this
> error, indicating that, for some reason, it is reducing the number of
> rows
> in my data set, which makes me think that it is doing list-wise
> deletion:
> Error in `$<-.data.frame`(`*tmp*`, "propensityscores", value =
> c(0.116801691392172,  :
> replacement has 15934 rows, data has 16844
> However, this method works if I remove the covariate that has missing
> data.
> 
> 
> So, I guess my question is, how do I get the code to impute for the
> variable that it is not imputing?  Or, do I just need to chuck this
> variable?  And, if I just need to chuck this variable, how do I get the
> optimal propensity score method to work?  Currently it doesn't work
> even
> when I chuck this variable.
> 
> Thank you for any help or advice!
> Liz
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.