[R] multiple imputation with fit.mult.impute in Hmisc - how to replace NA with imputed value?

Frank E Harrell Jr f.harrell at vanderbilt.edu
Wed Nov 26 17:46:11 CET 2008


Charlie Brush wrote:
> Frank E Harrell Jr wrote:
>> Charlie Brush wrote:
>>> I am doing multiple imputation with Hmisc, and
>>> can't figure out how to replace the NA values with
>>> the imputed values.
>>>
>>> Here's a general ourline of the process:
>>>
>>>  > set.seed(23)
>>>  > library("mice")
>>>  > library("Hmisc")
>>>  > library("Design")
>>>  > d <- read.table("DailyDataRaw_01.txt",header=T)
>>>  > length(d);length(d[,1])
>>> [1] 43
>>> [1] 2666
>>> Do for this data set, there are 43 columns and 2666 rows
>>>
>>> Here is a piece of data.frame d:
>>>  > d[1:20,4:6]
>>>  P01  P02  P03
>>> 1  0.1 0.16 0.16
>>> 2   NA 0.00 0.00
>>> 3   NA 0.60 0.04
>>> 4   NA 0.15 0.00
>>> 5   NA 0.00 0.00
>>> 6  0.7 0.00 0.75
>>> 7   NA 0.00 0.00
>>> 8   NA 0.00 0.00
>>> 9  0.0 0.00 0.00
>>> 10 0.0 0.00 0.00
>>> 11 0.0 0.00 0.00
>>> 12 0.0 0.00 0.00
>>> 13 0.0 0.00 0.00
>>> 14 0.0 0.00 0.00
>>> 15 0.0 0.00 0.03
>>> 16  NA 0.00 0.00
>>> 17  NA 0.01 0.00
>>> 18 0.0 0.00 0.00
>>> 19 0.0 0.00 0.00
>>> 20 0.0 0.00 0.00
>>>
>>> These are daily precipitation values at NCDC stations, and
>>> NA values at station P01 will be filled using multiple
>>> imputation and data from highly correlated stations P02 and P08:
>>>
>>>  > f <- aregImpute(~ I(P01) + I(P02) + I(P08), 
>>> n.impute=10,match='closest',data=d)
>>> Iteration 13
>>>  > fmi <- fit.mult.impute( P01 ~ P02 + P08 , ols, f, d)
>>>
>>> Variance Inflation Factors Due to Imputation:
>>>
>>> Intercept       P02       P08
>>>    1.01      1.39      1.16
>>>
>>> Rate of Missing Information:
>>>
>>> Intercept       P02       P08
>>>    0.01      0.28      0.14
>>>
>>> d.f. for t-distribution for Tests of Single Coefficients:
>>>
>>> Intercept       P02       P08
>>> 242291.18    116.05    454.95
>>>  > r <- apply(f$imputed$P01,1,mean)
>>>  > r
>>>    2     3     4     5     7     8    16    17   249   250   251
>>> 0.002 0.430 0.044 0.002 0.002 0.002 0.002 0.123 0.002 0.002 0.002
>>>  252   253   254   255   256   257   258   259   260   261   262
>>> 1.033 0.529 1.264 0.611 0.002 0.513 0.085 0.002 0.705 0.840 0.719
>>>  263   264   265   266   267   268   269   270   271   272   273
>>> 1.489 0.532 0.150 0.134 0.002 0.002 0.002 0.002 0.002 0.055 0.135
>>>  274   275   276   277   278   279   280   281   282   283   284
>>> 0.009 0.002 0.002 0.002 0.008 0.454 1.676 1.462 0.071 0.002 1.029
>>>  285   286   287   288   289   418   419   420   421   422   700
>>> 0.055 0.384 0.947 0.002 0.002 0.008 0.759 0.066 0.009 0.002 0.002
>>>
>>> ------------------------------------------------------------------
>>> So far, this is working great.
>>> Now, make a copy of d:
>>>  > dnew <- d
>>>
>>> And then fill in the NA values in P01 with the values in r
>>>
>>> For example:
>>>  > for (i in 1:length(r)){
>>>    dnew$P01[r[i,1]] <- r[i,2]
>>>    }
>>> This doesn't work, because each 'piece' of r is two numbers:
>>>  > r[1]
>>>   2
>>> 0.002
>>>  > r[1,1]
>>> Error in r[1, 1] : incorrect number of dimensions
>>>
>>> My question: how can I separate the the two items in (for example)
>>> r[1] to use the first part as an index and the second as a value,
>>> and then use them to replace the NA values with the imputed values?
>>>
>>> Or is there a better way to replace the NA values with the imputed 
>>> values?
>>>
>>> Thanks in advance for any help.
>>>
>>
>> You didn't state your goal, and why fit.mult.impute does not do what 
>> you want.   But you can look inside fit.mult.impute to see how it 
>> retrieves the imputed values.  Also see the example in documentation 
>> for transcan in which the command impute(xt, imputation=1) to retrieve 
>> one of the multiple imputations.
>>
>> Note that you can say library(Design) (omit the quotes) to access both 
>> Design and Hmisc.
>>
>> Frank
> Thanks for your help.
> My goal is to replace the NA values in the (copy of the) data frame with 
> the means of the imputed values (which are now in variable 'r').
> fit.mult.impute works fine. I just can't figure out the last step, 
> taking the results of fit.mult.impute (which are in variable 'r') and 
> replacing the NA values in the (copy of the) data frame.
> A simple for loop doesn't work because the items in 'r' don't look like 
> a normal vector, as for example r[1] returns
>  2
> 0.002
> Is there a command to replace the NA values in the data frame with the 
> means of the imputed values?
> 
> Thanks,
> Charlie
> 

Don't do that, as this would no longer be multiple imputation.  If you 
want single conditional mean imputation use transcan.

Frank


-- 
Frank E Harrell Jr   Professor and Chair           School of Medicine
                      Department of Biostatistics   Vanderbilt University



More information about the R-help mailing list