[R] Maintaining data order in factanal with missing data

Fri Jul 26 15:33:35 CEST 2013

Hi Petr,

So sorry, I accidentally attached the complete data set rather than the one
with missing values. I've attached the correct file to this email. RE:
init.dfs() being local, I hadn't even thought of that. I've been away from
OOP for close to 15 years now, so it might be time to revise!

The problem I have is that with missing values the list of factor scores
returned (ab.w1.fa$factor.scores) does not map onto the originating data
frame (ab.w1.df) as it no longer includes the cases which had missing
values. So while the original data set for ab.w1.df contains 154 ordered
cases, the factor analysis contains only 150.

I am seeking a way to map the values derived from the factor analysis
(ab.w1.fa$factor.scores) back to their original ordered position, so that
these factor score variables may be merged back into the master data frame
(ab.df). A unique ID for each case is available ($dmid) which I had thought
to use when merging the new variables, however I don't know how to implement
this.

Thanks for your help,

Justin

-----Original Message-----
From: PIKAL Petr [mailto:petr.pikal at precheza.cz] 
Sent: Friday, 26 July 2013 10:59 PM
To: Justin Delahunty; Justin Delahunty; r-help at r-project.org
Subject: RE: [R] Maintaining data order in factanal with missing data

Hi

Well, the function init.dfs does nothing as all data frames created inside
it does not propagate to global environment and there is nothing what the
function returns.

Tha last line (when used outside a function) gives warnings but there is no
sign of error.

When 

> head(ab.1.df)
  dmid   g5oab2      g53      g54      g55   g5ovb1
1    1 1.418932 1.805227 2.791152 3.624116 3.425586
2    2 2.293907 1.187830 1.611237 1.748526 3.816533
3    3 2.836536 2.679523 1.279639 2.674986 2.452395
4    4 1.872259 3.278359 1.785872 2.458315 1.146480
5    5 1.467195 1.180747 3.564127 3.007682 2.109506
6    6 3.098512 3.151974 3.969379 3.750571 1.497358
> head(ab.2.df)
  dmid   w2oab3      w22      w23      w24   w2ovb1
1    1 4.831362 5.522764 7.809366 6.969172 7.398385
2    2 6.706346 4.101742 1.434697 5.266775 5.357641
3    3 3.653806 2.666885 1.209326 5.125556 4.963374
4    4 7.221255 7.649152 6.540398 6.648506 2.576081
5    5 1.848023 5.044314 2.761881 3.307220 1.454234
6    6 7.606429 4.911766 2.034813 2.638573 2.818834
> head(ab.3.df)
  dmid   w3oab3   w3oab4   w3oab7   w3oab8   w3ovb1
1    1 5.835609 6.108220 6.587721 2.451461 2.785467
2    2 4.973198 1.196815 6.388056 1.110877 4.226463
3    3 3.800367 6.697287 5.235345 6.666829 6.319073
4    4 1.093141 1.477773 2.269252 3.194978 4.916342
5    5 1.975060 7.204516 4.825435 1.775874 3.484027
6    6 3.273361 2.243805 5.326547 5.720892 6.118723
>

> str(ab.1.fa)
List of 2
 $ rescaled.scores: Named num [1:154] 3.43 3.83 2.43 1.1 2.08 ...
  ..- attr(*, "names")= chr [1:154] "1" "2" "3" "4" ...
 $ factor.loadings: Named num [1:5] -0.0106 -0.0227 -0.1093 -0.0912 0.9975
  ..- attr(*, "names")= chr [1:5] "g5oab2" "g53" "g54" "g55" ...
> str(ab.2.fa)
List of 2
 $ rescaled.scores: Named num [1:154] 6.34 5.24 5.3 1.91 2.16 ...
  ..- attr(*, "names")= chr [1:154] "1" "2" "3" "4" ...
 $ factor.loadings: Named num [1:5] -0.2042 0.0063 -0.2287 -0.0119 0.7138
  ..- attr(*, "names")= chr [1:5] "w2oab3" "w22" "w23" "w24" ...
> str(ab.3.fa)
List of 2
 $ rescaled.scores: Named num [1:154] NaN NaN NaN NaN NaN NaN NaN NaN NaN
NaN ...
  ..- attr(*, "names")= chr [1:154] "1" "2" "3" "4" ...
 $ factor.loadings: Named num [1:5] -0.1172 0.0128 -0.0968 0.106 0.9975
  ..- attr(*, "names")= chr [1:5] "w3oab3" "w3oab4" "w3oab7" "w3oab8" ...

Anyway I have no idea what you consider wrong?

Regards
Petr

> -----Original Message-----
> From: Justin Delahunty [mailto:ACU at genius.net.au]
> Sent: Friday, July 26, 2013 2:22 PM
> To: PIKAL Petr; 'Justin Delahunty'; r-help at r-project.org
> Subject: RE: [R] Maintaining data order in factanal with missing data
> 
> Hi Petr,
> 
> Thanks for the quick response. Unfortunately I cannot share the data I 
> am working with, however please find attached a suitable R workspace 
> with generated data. It has the appropriate variable names, only the 
> data has been changed.
> 
> The last function in the list (init.dfs()) I call to subset the 
> overall data set into the three waves, then conduct the factor 
> analysis on each
> (1 factor CFA); it's just in a function to ease re-typing in a new 
> workspace.
> 
> 
> Thanks,
> 
> Justin
> 
> -----Original Message-----
> From: PIKAL Petr [mailto:petr.pikal at precheza.cz]
> Sent: Friday, 26 July 2013 7:35 PM
> To: Justin Delahunty; r-help at r-project.org
> Subject: RE: [R] Maintaining data order in factanal with missing data
> 
> Hi
> 
> You provided functions, so far so good. But without data it would be 
> quite difficult to understand what the functions do and where could be 
> the issue.
> 
> I suspect combination of complete cases selection together with subset 
> and factor behaviour. But I can be completely out of target too.
> 
> Petr
> 
> > -----Original Message-----
> > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r- 
> > project.org] On Behalf Of s00123776 at myacu.edu.au
> > Sent: Friday, July 26, 2013 9:35 AM
> > To: r-help at r-project.org
> > Subject: [R] Maintaining data order in factanal with missing data
> >
> > Hi,
> >
> >
> >
> > I'm new to R, so sorry if this is a simple answer. I'm currently 
> > trying to collapse some ordinal variables into a composite; the 
> > program ideally should take a data frame as input, perform a factor 
> > analysis, compute factor scores, sds, etc., and return the rescaled 
> > scores and loadings. The difficulty I'm having is that my data set 
> > contains a number of NA, which I am excluding from the analysis 
> > using complete.cases(), and thus the incomplete cases are "skipped". 
> > These functions are for a longitudinal data set with repeated waves 
> > of
> data,
> > so the final rescaled scores from each wave need to be saved as 
> > variables grouped by a unique ID (DMID). The functions I'm trying to 
> > implement are as follows:
> >
> >
> >
> > weighted.sd<-function(x,w){
> >
> >                                 sum.w<-sum(w)
> >
> >                                 sum.w2<-sum(w^2)
> >
> >                                 mean.w<-sum(x*w)/sum(w)
> >
> >
> > x.sd.w<-sqrt((sum.w/(sum.w^2-sum.w2))*sum(w*(x-mean.w)^2))
> >
> >                                 return(x.sd.w)
> >
> >                                 }
> >
> >
> >
> > re.scale<-function(f.scores, raw.data, loadings){
> >
> >
> > fz.scores<-(f.scores+mean(f.scores))/(sd(f.scores))
> >
> >
> > means<-apply(raw.data,1,weighted.mean,w=loadings)
> >
> >
> > sds<-apply(raw.data,1,weighted.sd,w=loadings)
> >
> >                                 grand.mean<-mean(means)
> >
> >                                 grand.sd<-mean(sds)
> >
> >
> > final.scores<-((fz.scores*grand.sd)+grand.mean)
> >
> >                                 return(final.scores)
> >
> >                                 }
> >
> >
> >
> > get.scores<-function(data){
> >
> >
> > fact<-
> > factanal(data[complete.cases(data),],factors=1,scores="regression")
> >
> >                                 f.scores<-fact$scores[,1]
> >
> >                                 f.loads<-fact$loadings[,1]
> >
> >                                 rescaled.scores<-re.scale(f.scores,
> > data[complete.cases(data),], f.loads)
> >
> >                                 output.list<-list(rescaled.scores,
> > f.loads)
> >
> >                                 names(output.list)<- 
> > c("rescaled.scores",
> > "factor.loadings")
> >
> >                                 return(output.list)
> >
> >                                 }
> >
> >
> >
> > init.dfs<-function(){
> >
> >
> > ab.1.df<-subset(ab.df,,select=c(dmid,g5oab2:g5ovb1))
> >
> >
> > ab.2.df<-subset(ab.df,,select=c(dmid,w2oab3:w2ovb1))
> >
> >                                 
> > ab.3.df<-subset(ab.df,,select=c(dmid,
> > w3oab3, w3oab4, w3oab7, w3oab8, w3ovb1))
> >
> >
> >
> >                                 ab.1.fa<-get.scores(ab.1.df[-1])
> >
> >                                 ab.2.fa<-get.scores(ab.2.df[-1])
> >
> >                                 ab.3.fa<-get.scores(ab.3.df[-1])
> >
> >
> >                                 }
> >
> >
> >
> > Thanks for your help,
> >
> >
> >
> > Justin
> >
> >
> > 	[[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting- 
> > guide.html and provide commented, minimal, self-contained, 
> > reproducible code.
>