[R] how to combine imputed data-sets from mice for classfication
Eleni Rapsomaniki
e.rapsomaniki at mail.cryst.bbk.ac.uk
Mon Oct 30 09:18:56 CET 2006
Dear R users
I want to combine multiply imputed data-sets generated from mice to do
classfication.
However, I have various questions regarding the use of mice library.
For example suppose I want to predict the class in this data.frame:
data(nhanes)
mydf=nhanes
mydf$class="pos"
mydf$class[sample(1:nrow(mydf), size=0.5*nrow(mydf))]="neg"
mydf$class=factor(mydf$class)
First I impute:
imp=mice(mydf)
I want to use randomForest to do my analysis, not the inbuilt glm.mids
functions.
In a previous post it was suggested to substitute the call to (g)lm.mids for the
analysis one needs to perform:
(from http://tolstoy.newcastle.edu.au/R/help/06/03/22295.html)
analyses <- as.list(1:data$m)
for (i in 1:data$m) {
data.i <- complete(data, i)
analyses[[i]] <- lm(formula, data = data.i, ...)
}
Is the idea that then I should just combine the results(predictions) of
randomForest from all 5 data-sets? In that case what does the pool function do?
Do I need to use it?
Also, if I was to use glm.mids for my predictions I get an error:
> imp.fit=glm.mids(class ~., data=imp)
Error: NA/NaN/Inf in foreign function call (arg 4)
In addition: Warning messages:
1: - not meaningful for factors in: Ops.factor(y, mu)
2: - not meaningful for factors in: Ops.factor(eta, offset)
3: - not meaningful for factors in: Ops.factor(y, mu)
But this works:
> imp.fit=glm.mids((class=="pos") ~., data=imp)
In this case I don't know how to interpret the result..
I would appreciate any suggestions on these.
Many Thanks
Eleni Rapsomaniki
More information about the R-help
mailing list