[R] Is there a good package for multiple imputation of missi

(Ted Harding) Ted.Harding at manchester.ac.uk
Mon Jun 30 22:36:37 CEST 2008


I'm surprised that 'norm' was suggested, given your clear statement
that you have both categorical and continuous variables.

The right package (within the "Shafer stable" of cat, norm, mix, and pan)
is the 'mix' package. It is expressly written for this case of
mixed variable types.

You would need make sure that all the categorical variables are in
the left-hand columns of the data matrix (and that it is a matrix,
not a dataframe), to read the documentation fully and carefully, and
also bear in mind that (as a result of the way Shafer indexes the
positions of missing values in the rows) you can have at most 31 columns
of categorical variables, and 31 columns of continuous variables.
And also bear in mind that the model for the continuous variables
is multivariate Normal, the same covariance matrix for all cases,
and the vector of means for a case depending on the levels of the
categorical variables in that case. If all that fits your situation,
then 'mix' is well worth considering.

Hoping this helps,
Ted.

On 30-Jun-08 17:55:26, Robert A LaBudde wrote:
> At 03:02 AM 6/30/2008, Robert A. LaBudde wrote:
>>I'm looking for a package that has a start-of-the-art method of 
>>imputation of missing values in a data frame with both continuous 
>>and factor columns.
>>
>>I've found transcan() in 'Hmisc', which appears to be possibly 
>>suited to my needs, but I haven't been able to figure out how to get 
>>a new data frame with the imputed values replaced (I don't have 
>>Herrell's book).
>>
>>Any pointers would be appreciated.
> 
> Thanks to "paulandpen", Frank and Shige for suggestions.
> 
> I looked at the packages 'Hmisc', 'mice', 'Amelia' and 'norm'.
> 
> I still haven't mastered the methodology for using aregImpute() in 
> 'Hmisc' based on the help information. I think I'll have to get hold 
> of Frank's book to see how it's used in a complete example.
> 
> 'Amelia' and 'norm' appear to be focused solely on continuous, 
> multivariate normal variables, but my needs typically involve 
> datasets with both factors and continuous variables.
> 
> The function mice() in 'mice' appears to best suit my needs, and the 
> help file was intelligible, and it works on both factors and 
> continuous variables.
> 
> For those in the audience with similar issues, here is a code snippet 
> showing how some of these functions work ('felon' is a data frame 
> with categorical and continuous predictors of the binary variable
> 'hired'):
> 
> library('mice') #missing data imputation library for md.pattern(), 
> mice(), complete()
> names(felon)  #show variable names
> md.pattern(felon[,1:4]) #show patterns for missing data in 1st 4 vars
> 
> library('Hmisc')  #package for na.pattern() and impute()
> na.pattern(felon[,1:4]) #show patterns for missing data in 1st 4 vars
> 
>#simple imputation can be done by
> felon2<- felon  #make copy
> felon2$felony<- impute(felon2$felony) #impute NAs (most frequent)
> felon2$gender<- impute(felon2$gender) #impute NAs
> felon2$natamer<- impute(felon2$natamer) #impute NAs
> na.pattern(felon2[,1:4]) #show no NAs left in these vars
> fit2<- glm(hired ~ felony + gender + natamer, data=felon2,
> family=binomial)
> summary(fit2)
> 
>#better, multiple imputation can be done via mice():
> imp<- mice(felon[,1:4]) #do multiple imputation (default is 5
> realizations)
> for (iSet in 1:5) {  #show results for the 5 imputation datasets
>    fit<- glm(hired ~ felony + gender + natamer,
>      data=complete(imp, iSet), family=binomial)  #fit to iSet-th
> realization
>    print(summary(fit))
> }
> 
> ================================================================
> Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: ral at lcfltd.com
> Least Cost Formulations, Ltd.            URL: http://lcfltd.com/
> 824 Timberlake Drive                     Tel: 757-467-0954
> Virginia Beach, VA 23464-3239            Fax: 757-467-2947
> 
> "Vere scire est per causas scire"
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 30-Jun-08                                       Time: 21:36:34
------------------------------ XFMail ------------------------------



More information about the R-help mailing list