Donald B. Rubin (1987)
Multiple Imputation for Nonresponse in Surveys;
Wiley. http://dx.doi.org/10.1002/9780470316696; available via ETH Library.
Roderick J.A. Little and Donald B. Rubin (2002)
Statistical Analysis with Missing Data (Wiley; 2nd ed.) Freely available (chapter wise) via ETH Library
norm
, cat
, mix
, and pan
see below.Stef van Buuren (2012)
Flexible Imputation of Missing Data CRC Chapman & Hall (Taylor & Francis). Online via ETH library
Applied; much R code, based on R package mice
(see below) –> SvB’s Multiple-Imputation.com Website
R package smcfcs
(developed on http://github.com/jwb133/smcfcs ) with publication
Bartlett JW, Seaman SR, White IR, Carpenter JR (2015). Multiple imputation of covariates by fully conditional specification: Accommodating the substantive model.
Statistical Methods in Medical Research 24(4): 462-487. url http://doi.org/10.1177/0962280214521348
Based on missing.pdf paper, Hastie et al. (1999).
has section Missing data (not quite comprehensive, annotated by MM):
mitools provides tools for multiple imputation, by Thomas Lumley (R core, also author of survey
).
mice provides Multivariate Imputation by Chained Equations. By Stef van Buuren, it is also the basis of his book
matrixplot()
, scattmatrixMiss()
kNN()
citation(package="VIM")
##
## To cite package 'VIM' in publications use:
##
## Matthias Templ, Andreas Alfons, Alexander Kowarik and Bernd
## Prantner (2015). VIM: Visualization and Imputation of Missing
## Values. R package version 4.4.1.
## http://CRAN.R-project.org/package=VIM
##
## A BibTeX entry for LaTeX users is
##
## @Manual{,
## title = {VIM: Visualization and Imputation of Missing Values},
## author = {Matthias Templ and Andreas Alfons and Alexander Kowarik and Bernd Prantner},
## year = {2015},
## note = {R package version 4.4.1},
## url = {http://CRAN.R-project.org/package=VIM},
## }
##
## ATTENTION: This citation information has been auto-generated from
## the package DESCRIPTION file and may need manual editing, see
## 'help("citation")'.
aregImpute()
and transcan()
from Hmisc provide further imputation methods.system.file(package="pan")
## [1] "/sfs/s/linux/rhel3_amd64/app/R/R_local/library_F22/pan"
All four of these are available on CRAN (no longer showing the full description)
sapply(c("norm", "cat", "mix", "pan"), packageDescription)
is considerably more comprehensive (than the Multivariate
one):
A distinction between iterative model-based methods, k-nearest neighbor methods and miscellaneous methods is made. However, often the criteria for using a method depend on the scale of the data, which in official statistics are typically a mixture of continuous, semi-continuous, binary, categorical and count variables. In addition, measurement errors may corrupt non-robust imputation methods. Note that only few imputation methods can deal with mixed types of variables and only few methods account for robustness issues.
EM-based Imputation Methods:
mi
provides iterative EM-based multiple Bayesian regression imputation of missing values and model checking of the regression models used. The regression models for each variable can also be user-defined. The data set may consist of continuous, semi-continuous, binary, categorical and/or count variables.mice
provides iterative EM-based multiple regression imputation. The data set may consist of continuous, binary, categorical and/or count variables.mitools
provides tools to perform analyses and combine results from multiply-imputated datasets.Amelia
provides multiple imputation where first bootstrap samples with the same dimensions as the original data are drawn, and then used for EM-based imputation. It is also possible to impute longitudial data. The package in addition comes with a graphical user interface.VIM
provides EM-based multiple imputation (function irmi()) using robust estimations, which allows to adequately deal with data including outliers. It can handle data consisting of continuous, semi-continuous, binary, categorical and/or count variables.mix
provides iterative EM-based multiple regression imputation. The data set may consist of continuous, binary or categorical variables, but methods for semi-continuous variables are missing.pan
provides multiple imputation for multivariate panel or clustered data.norm
provides EM-based multiple imputation for multivariate normal data.cat
provides EM-based multiple imputation for multivariate categorical data.MImix
provides tools to combine results for multiply-imputed data using mixture approximations.robCompositions
provides iterative model-based imputation for compositional data (function impCoda()).Nearest Neighbor Imputation Methods
VIM
provides an implementation of the popular sequential and random (within a domain) hot-deck algorithm. VIM also provides a fast k-nearest neighbor (knn) algorithm which can be used for large data sets. It uses a modification of the Gower Distance for numerical, categorical, ordered, continuous and semi-continous variables.yaImpute
performs popular nearest neighbor routines for imputation of continuous variables where different metrics and methods can be used for determining the distance between observations.robCompositions
provides knn imputation for compositional data (function impKNNa()) using the Aitchison distance and adjustment of the nearest neighbor.rrcovNA
provides an algorithm for (robust) sequential imputation (function impSeq() and impSeqRob() by minimizing the determinant of the covariance of the augmented data matrix. It’s application is limited to continuous scaled data.impute
on Bioconductor impute provides knn imputation of continuous variables.Copula-based Imputation Methods:
CoImp
imputes multivariate missing data by using conditional copula functions. The imputation procedure is semiparametric: the margins are non-parametrically estimated through local likelihood of low-degree polynomials while a range of different parametric models for the copula can be selected by the user. The missing values are imputed by drawing observations from the conditional density functions by means of the Hit or Miss Monte Carlo method. It works either for a matrix of continuous scaled variables or a matrix of discrete distributions.Miscellaneous Imputation Methods:
missMDA
allows to impute incomplete continuous variables by principal component analysis (PCA) or categorical variables by multiple correspondence analysis (MCA).mice
(function mice.impute.pmm()) and Package Hmisc
(function aregImpute()) allow predicitve mean matching imputation.VIM
allows to visualize the structure of missing values using suitable plot methods. It also comes with a graphical user interface.Title: A General Imputation Framework in R
Description: General imputation framework based on variable selection methods including regularisation methods, tree-based models and dimension reduction methods.
Version: 1.0.0
Published: 2014-05-14
Author: Lingbing Feng, Gen Nowak, Alan. H. Welsh, Terry. J. O'Neill
Title: Matrix Completion via Iterative Soft-Thresholded SVD
Version: 1.4
Date: 2015-2-13
Author: Trevor Hastie and Rahul Mazumder
Blog advocating “Available Cases” AC notably because MI (trying Amelia
only is “slow and not better statistically”
regtools
R package only on github, with R functions ending in ac
, e.g., lmac()
.mice
, mi
, and Amelia
)Rmd rendered web pages There are three main R packages … multiple imputation techniques.