[R] Multiple Imputation in mice/norm
Frank E Harrell Jr
f.harrell at vanderbilt.edu
Sun Apr 26 00:38:17 CEST 2009
David Winsemius wrote:
>
> On Apr 25, 2009, at 9:25 AM, Frank E Harrell Jr wrote:
>
>> Emmanuel Charpentier wrote:
>>> Le vendredi 24 avril 2009 à 14:11 -0700, ToddPW a écrit :
>>>> I'm trying to use either mice or norm to perform multiple imputation
>>>> to fill
>>>> in some missing values in my data. The data has some missing values
>>>> because
>>>> of a chemical detection limit (so they are left censored). I'd like
>>>> to use
>>>> MI because I have several variables that are highly correlated. In
>>>> SAS's
>>>> proc MI, there is an option with which you can limit the imputed
>>>> values that
>>>> are returned to some range of specified values. Is there a way to
>>>> limit the
>>>> values in mice?
>>> You may do that by writing your own imputation function and assign them
>>> for the imputation of particular variable (see argument
>>> "imputationMethod" and details in the man page for "mice").
>>>> If not, is there another MI tool in R that will
>>>> allow me to
>>>> specify a range of acceptable values for my imputed data?
>>> In the function amelia (package "Amelia"), you might specify a "bounds"
>>> argument, which allows for such a limitation. However, be aware that
>>> this might destroy the basic assumption of Amelia, which is that your
>>> data are multivariate normal. Maybe a change of variable is in order (e.
>>> g. log(concentration) has usually much better statistical properties
>>> than concentration).
>>> Frank Harrell's aregImpute (package Hmisc) has the "curtail" argument
>>> (TRUE by default) which limits imputations to the range of observed
>>> values.
>>> But if your left-censored variables are your dependent variables (not
>>> covariates), may I suggest to analyze these data as censored data, as
>>> allowed by Terry Therneau's "coxph" function (package "survival") ? code
>>> your "missing" data as such a variable (use :
>>> coxph(Surv(min(x,<yourlimit>,na.rm=TRUE),
>>> !is.na(x),type="left")~<Yourmodel>) to do this on-the-fly).
>>> Another possible idea is to split your (supposedly x) variable in two :
>>> observed (logical), and value (observed value if observed, <detection
>>> limit> if not) and include these two data in your model. You probably
>>> will run into numerical difficulties due to the (built-in total
>>> separation...).
>>> HTH,
>>> Emmanuel Charpentier
>>>> Thanks for the help,
>>>> Todd
>>>>
>>
>> All see
>>
>> @Article{zha09non,
>> author = {Zhang, Donghui and Fan, Chunpeng and Zhang,
>> Juan and Zhang, {Cun-Hui}},
>> title = {Nonparametric methods for measurements below
>> detection limit},
>> journal = Stat in Med,
>> year = 2009,
>> volume = 28,
>> pages = {700-715},
>> annote = {lower limit of detection;left censoring;Tobit
>> model;Gehan test;Peto-Peto test;log-rank test;Wilcoxon test;location
>> shift model;superiority of nonparametric methods}
>> }
>>
>>
>> --
>> Frank E Harrell Jr Professor and Chair School of Medicine
>> Department of Biostatistics Vanderbilt University
>>
>
> It appears they were dealing with outcomes possibly censored at a limit
> of detection. At least that was the example they used to illustrate.
>
> Is there a message that can be inferred about what to do with covariates
> with values below the limit of detection? And can someone translate to a
> non-statistician what the operational process was on the values below
> the limit of detection in the Wilcoxon approach that they endorsed? They
> transformed the right censored situation into a left censored one and
> then they do ... what?
>
> David Winsemius, MD
> Heritage Laboratories
> West Hartford, CT
>
>
Yes it's easier to handle in the dependent variable. For independent
variables below the limit of detection we are left with model-based
extrapolation for multiple imputation, with no way to check the
imputation model's regression assumption. Predictive mean matching
can't be used.
Frank
--
Frank E Harrell Jr Professor and Chair School of Medicine
Department of Biostatistics Vanderbilt University
More information about the R-help
mailing list