[R] Multiple Imputation in mice/norm

David Winsemius dwinsemius at comcast.net
Sat Apr 25 18:28:35 CEST 2009

On Apr 25, 2009, at 9:25 AM, Frank E Harrell Jr wrote:

> Emmanuel Charpentier wrote:
>> Le vendredi 24 avril 2009 à 14:11 -0700, ToddPW a écrit :
>>> I'm trying to use either mice or norm to perform multiple  
>>> imputation to fill
>>> in some missing values in my data.  The data has some missing  
>>> values because
>>> of a chemical detection limit (so they are left censored).  I'd  
>>> like to use
>>> MI because I have several variables that are highly correlated.   
>>> In SAS's
>>> proc MI, there is an option with which you can limit the imputed  
>>> values that
>>> are returned to some range of specified values.  Is there a way to  
>>> limit the
>>> values in mice?
>> You may do that by writing your own imputation function and assign  
>> them
>> for the imputation of particular variable (see argument
>> "imputationMethod" and details in the man page for "mice").
>>>                 If not, is there another MI tool in R that will  
>>> allow me to
>>> specify a range of acceptable values for my imputed data?
>> In the function amelia (package "Amelia"), you might specify a  
>> "bounds"
>> argument, which allows for such a limitation. However, be aware that
>> this might destroy the basic assumption of Amelia, which is that your
>> data are multivariate normal. Maybe a change of variable is in  
>> order (e.
>> g. log(concentration) has usually much better statistical properties
>> than concentration).
>> Frank Harrell's aregImpute (package Hmisc) has the "curtail" argument
>> (TRUE by default) which limits imputations to the range of observed
>> values.
>> But if your left-censored variables are your dependent variables (not
>> covariates), may I suggest to analyze these data as censored data, as
>> allowed by Terry Therneau's "coxph" function (package "survival") ?  
>> code
>> your "missing" data as such a variable (use :
>> coxph(Surv(min(x,<yourlimit>,na.rm=TRUE),
>>           !is.na(x),type="left")~<Yourmodel>) to do this on-the-fly).
>> Another possible idea is to split your (supposedly x) variable in  
>> two :
>> observed (logical), and value (observed value if observed, <detection
>> limit> if not) and include these two data in your model. You probably
>> will run into numerical difficulties due to the (built-in total
>> separation...).
>> HTH,
>> 					Emmanuel Charpentier
>>> Thanks for the help,
>>> Todd
> All see
> @Article{zha09non,
>  author =               {Zhang, Donghui and Fan, Chunpeng and Zhang,  
> Juan and Zhang, {Cun-Hui}},
>  title =                {Nonparametric methods for measurements  
> below detection limit},
>  journal =      Stat in Med,
>  year =                 2009,
>  volume =       28,
>  pages =        {700-715},
>  annote =       {lower limit of detection;left censoring;Tobit  
> model;Gehan test;Peto-Peto test;log-rank test;Wilcoxon test;location  
> shift model;superiority of nonparametric methods}
> }
> -- 
> Frank E Harrell Jr   Professor and Chair           School of Medicine
>                     Department of Biostatistics   Vanderbilt  
> University

It appears they were dealing with outcomes possibly censored at a  
limit of detection. At least that was the example they used to  

Is there a message that can be inferred about what to do with  
covariates with values below the limit of detection? And can someone  
translate to a non-statistician what the operational process was on  
the values below the limit of detection in the Wilcoxon approach that  
they endorsed? They transformed the right censored situation into a  
left censored one and then they do   ... what?

David Winsemius, MD
Heritage Laboratories
West Hartford, CT

More information about the R-help mailing list