[R] Combining imputed datasets for analysis using Factor Analysis

Mon Aug 20 16:19:36 CEST 2012

Dear R users and developers,

I have a dataset containing 34 variables measured in a survey, which has 
some missing items. I would like to conduct a factor analysis of this 
data. I tested mi, Amelia, and MissForest as alternative packages in 
order to impute the missing data. I now have 5 separate datasets with 
the variables I am interested in factor analysing. In my reading of the 
package help files, various articles and books I have come across a 
number of suggestions for combining analyses (mostly regression or other 
linear models) using Rubin's (1987) rules.

However, I am not sure how I should proceed in the case of factor 
analysis. Should I calculate the covariance matrix or correlation matrix 
for my dataset, combine these estimates and then perform a factor 
analysis. Or should I conduct a FA of each complete imputed dataset and 
then combine the results (say eigenvalues or fit statistics)? Could 
anyone guide me to literature (if possible, not overly technical) that 
would guide me in this regard? Or provide an example of a script that 
would help me achieve this?

Your assistance and time is much appreciated.

Kind Regards,
Conrad Zygmont
Psychology Department
Helderberg College
South Africa

Additional info:
R version 2.15.1 (2012-06-22) -- "Roasted Marshmallows"
Running on Linux version 3.3.8-gentoo (root at PsychStat) (gcc version 
4.5.3 (Gentoo 4.5.3-r2 p1.5, pie-0.4.7) )

Script for multiple imputation:
 > var.info <- mi.info(LRN)
 > var.info
 > var.info <- update(var.info, "type", list("LRN1" = 
"ordered-categorical", "LRN2" = "ordered-categorical", "LRN3" = 
"ordered-categorical", "LRN4" = "ordered-categorical", "LRN5" = 
"ordered-categorical", "LRN6" = "ordered-categorical", "LRN7" = 
"ordered-categorical", "LRN8" = "ordered-categorical", "LRN9" = 
"ordered-categorical", "LRN10" = "ordered-categorical", "LRN11" = 
"ordered-categorical", "LRN12" = "ordered-categorical", "LRN13" = 
"ordered-categorical", "LRN14" = "ordered-categorical", "LRN15" = 
"ordered-categorical", "LRN16" = "ordered-categorical", "LRN17" = 
"ordered-categorical", "LRN18" = "ordered-categorical", "LRN19" = 
"ordered-categorical", "LRN20" = "ordered-categorical", "LRN21" = 
"ordered-categorical", "LRN22" = "ordered-categorical", "LRN23" = 
"ordered-categorical", "LRN24" = "ordered-categorical", "LRN25" = 
"ordered-categorical", "LRN26" = "ordered-categorical", "LRN27" = 
"ordered-categorical", "LRN28" = "ordered-categorical", "LRN29" = 
"ordered-categorical", "LRN30" = "ordered-categorical", "LRN31" = 
"ordered-categorical", "LRN32" = "ordered-categorical", "LRN33" = 
"ordered-categorical", "LRN34" = "ordered-categorical"))
 > prepared.data <- mi.preprocess(SOC, info = var.info)
 > prepared.data <- mi.preprocess(LRN, info = var.info)
 > ImpLRN <- mi(prepared.data, n.imp = 5, n.iter = 50, 
check.coef.convergence = TRUE, add.noise = noise.control(post.run.iter = 
30))
 > LRN.imputed <- mi.completed(ImpLRN)
 > LRN.first <- mi.data.frame(ImpLRN, m=1)
 > cov.mat <- polychoric(LRN.first,std.err=TRUE)
... and so on