[R] Multiple Imputation in mice/norm

Emmanuel Charpentier charpent at bacbuc.dyndns.org
Sat Apr 25 14:34:06 CEST 2009


Le vendredi 24 avril 2009 à 14:11 -0700, ToddPW a écrit :
> I'm trying to use either mice or norm to perform multiple imputation to fill
> in some missing values in my data.  The data has some missing values because
> of a chemical detection limit (so they are left censored).  I'd like to use
> MI because I have several variables that are highly correlated.  In SAS's
> proc MI, there is an option with which you can limit the imputed values that
> are returned to some range of specified values.  Is there a way to limit the
> values in mice?  

You may do that by writing your own imputation function and assign them
for the imputation of particular variable (see argument
"imputationMethod" and details in the man page for "mice").

>                  If not, is there another MI tool in R that will allow me to
> specify a range of acceptable values for my imputed data?

In the function amelia (package "Amelia"), you might specify a "bounds"
argument, which allows for such a limitation. However, be aware that
this might destroy the basic assumption of Amelia, which is that your
data are multivariate normal. Maybe a change of variable is in order (e.
g. log(concentration) has usually much better statistical properties
than concentration).

Frank Harrell's aregImpute (package Hmisc) has the "curtail" argument
(TRUE by default) which limits imputations to the range of observed
values.

But if your left-censored variables are your dependent variables (not
covariates), may I suggest to analyze these data as censored data, as
allowed by Terry Therneau's "coxph" function (package "survival") ? code
your "missing" data as such a variable (use :
coxph(Surv(min(x,<yourlimit>,na.rm=TRUE),
           !is.na(x),type="left")~<Yourmodel>) to do this on-the-fly).

Another possible idea is to split your (supposedly x) variable in two :
observed (logical), and value (observed value if observed, <detection
limit> if not) and include these two data in your model. You probably
will run into numerical difficulties due to the (built-in total
separation...).

HTH,

					Emmanuel Charpentier

> Thanks for the help,
> Todd
> 
>




More information about the R-help mailing list