[R] Multiple imputation, especially in rms/Hmisc packages

Tue Aug 10 03:24:42 CEST 2010

Hello, I have a general question about combining imputations as well as a
question specific to the rms and Hmisc packages.

The situation is multiple regression on a data set where multiple
imputation has been used to give M imputed data sets. I know how to get
the combined estimate of the covariance matrix of the estimated
coefficients (average the M covariance matrices from the individual
imputations and add (M+1)/M times the between-imputation covariance
matrix), and I know how to use this to get p-values and confidence
intervals for the individual coefficients.

What I don't know is how to combine imputations to obtain the multiple
degree-of-freedom tests to test whether a set of coefficients are all 0.
If there were no missing values, this would be done using the "general
linear test" which involves fitting the full model and the reduced
model and getting an F statistic based on the residual sums of squares and
degrees of freedom. How does one correctly do this when there are multiple
imputations? In the language of Hmisc, I'm asking how the values in
anova(fmi) are calculated if fmi is the result of fit.mult.impute (I've
looked at the anova.rms code but couldn't understand it).

The second question I have relates to rms/Hmisc specifically. When I use
fit.mult.impute, the fitted values don't appear to match the values given
by the predict function on the original data. Instead, they match the
fitted values of the last imputation. For example,

library(rms)
set.seed(1)
x1 <- rnorm(40)
x2 <- 0.5*x1 + rnorm(40,0,3)

y <- x1^2 + x2 + rnorm(40,0,3)
x2[40] <- NA   # create 1 missing value in x2
a <- aregImpute(~ y + x1 + x2, n.impute=2, nk=3) # 2 imputations

x2.imp1 <- x2; x2.imp1[40] <- a$imputed$x2[,1]  # first imputed x2 vector
x2.imp2 <- x2; x2.imp2[40] <- a$imputed$x2[,2]  # second imputed x2 vector

ols.imp1 <- ols(y ~ rcs(x1,3) + x2.imp1)  # model on imputation 1
ols.imp2 <- ols(y ~ rcs(x1,3) + x2.imp2)  # model on imputation 2

d <- data.frame(y, x1, x2)
fmi <- fit.mult.impute(y ~ rcs(x1,3) + x2, ols, a, data=d)

fmi$fitted
predict(fmi, d)

I would have expected the results of fmi$fitted and predict(fmi,d) to be
the same except for the final NA, but they are not. Instead, fmi$fitted is
the same as ols.imp2$fitted. Also,

anova(ols.imp2)
anova(fmi)

unexpectedly give the same result in the ERROR line. I would be most
grateful if anyone could explain this to me.

Thanks,
Mark
-- 
Mark Seeto
Statistician

National Acoustic Laboratories <http://www.nal.gov.au/>
A Division of Australian Hearing