[R] Multiple Imputation in mice/norm

Frank E Harrell Jr f.harrell at vanderbilt.edu
Sun Apr 26 00:38:17 CEST 2009

David Winsemius wrote:
> On Apr 25, 2009, at 9:25 AM, Frank E Harrell Jr wrote:
>> Emmanuel Charpentier wrote:
>>> Le vendredi 24 avril 2009 à 14:11 -0700, ToddPW a écrit :
>>>> I'm trying to use either mice or norm to perform multiple imputation 
>>>> to fill
>>>> in some missing values in my data.  The data has some missing values 
>>>> because
>>>> of a chemical detection limit (so they are left censored).  I'd like 
>>>> to use
>>>> MI because I have several variables that are highly correlated.  In 
>>>> SAS's
>>>> proc MI, there is an option with which you can limit the imputed 
>>>> values that
>>>> are returned to some range of specified values.  Is there a way to 
>>>> limit the
>>>> values in mice?
>>> You may do that by writing your own imputation function and assign them
>>> for the imputation of particular variable (see argument
>>> "imputationMethod" and details in the man page for "mice").
>>>>                 If not, is there another MI tool in R that will 
>>>> allow me to
>>>> specify a range of acceptable values for my imputed data?
>>> In the function amelia (package "Amelia"), you might specify a "bounds"
>>> argument, which allows for such a limitation. However, be aware that
>>> this might destroy the basic assumption of Amelia, which is that your
>>> data are multivariate normal. Maybe a change of variable is in order (e.
>>> g. log(concentration) has usually much better statistical properties
>>> than concentration).
>>> Frank Harrell's aregImpute (package Hmisc) has the "curtail" argument
>>> (TRUE by default) which limits imputations to the range of observed
>>> values.
>>> But if your left-censored variables are your dependent variables (not
>>> covariates), may I suggest to analyze these data as censored data, as
>>> allowed by Terry Therneau's "coxph" function (package "survival") ? code
>>> your "missing" data as such a variable (use :
>>> coxph(Surv(min(x,<yourlimit>,na.rm=TRUE),
>>>           !is.na(x),type="left")~<Yourmodel>) to do this on-the-fly).
>>> Another possible idea is to split your (supposedly x) variable in two :
>>> observed (logical), and value (observed value if observed, <detection
>>> limit> if not) and include these two data in your model. You probably
>>> will run into numerical difficulties due to the (built-in total
>>> separation...).
>>> HTH,
>>>                     Emmanuel Charpentier
>>>> Thanks for the help,
>>>> Todd
>> All see
>> @Article{zha09non,
>>  author =               {Zhang, Donghui and Fan, Chunpeng and Zhang, 
>> Juan and Zhang, {Cun-Hui}},
>>  title =                {Nonparametric methods for measurements below 
>> detection limit},
>>  journal =      Stat in Med,
>>  year =                 2009,
>>  volume =       28,
>>  pages =        {700-715},
>>  annote =       {lower limit of detection;left censoring;Tobit 
>> model;Gehan test;Peto-Peto test;log-rank test;Wilcoxon test;location 
>> shift model;superiority of nonparametric methods}
>> }
>> -- 
>> Frank E Harrell Jr   Professor and Chair           School of Medicine
>>                     Department of Biostatistics   Vanderbilt University
> It appears they were dealing with outcomes possibly censored at a limit 
> of detection. At least that was the example they used to illustrate.
> Is there a message that can be inferred about what to do with covariates 
> with values below the limit of detection? And can someone translate to a 
> non-statistician what the operational process was on the values below 
> the limit of detection in the Wilcoxon approach that they endorsed? They 
> transformed the right censored situation into a left censored one and 
> then they do   ... what?
> David Winsemius, MD
> Heritage Laboratories
> West Hartford, CT

Yes it's easier to handle in the dependent variable.  For independent 
variables below the limit of detection we are left with model-based 
extrapolation for multiple imputation, with no way to check the 
imputation model's regression assumption.  Predictive mean matching 
can't be used.


Frank E Harrell Jr   Professor and Chair           School of Medicine
                      Department of Biostatistics   Vanderbilt University

More information about the R-help mailing list