[R] Maintaining data order in factanal with missing data
David Carlson
dcarlson at tamu.edu
Fri Jul 26 18:12:58 CEST 2013
When you use complete.cases(), it creates a logical vector
that selects cases with no missing values, but it does not
change the rownames in the data.frame and those are carried
through to the factor scores so you could link them up that
way. Alternatively, you could use na.exclude as the na.action
in the call to factanal() instead of complete.cases()? That
should pad the output with NAs for the cases that have missing
data.
You have to use the formula version of factanal():
> set.seed(42)
> example <- data.frame(x=rnorm(15), y=rnorm(15), z=rnorm(15))
> to.na <- cbind(sample.int(15, 3), sample.int(3, 3))
> example[to.na] <- NA
> out <- factanal(~x+y+z, 1, data=example, na.action=na.omit,
scores="regression")
> out$scores # With na.omit the cases with missing values
are gone as indicated
Factor1 # by the missing row numbers
2 -0.92604879
4 0.10731539
5 -0.24370504
6 0.07357697
7 0.69905895
8 -0.17646575
9 1.58430095
10 -0.35934769
12 1.07671299
13 -1.47487960
14 -0.30235156
15 -0.05816682
> out <- factanal(~x+y+z, 1, data=example,
na.action=na.exclude, scores="regression")
> out$scores # With na.exclude, the cases are kept out of
the analysis but the rows are
Factor1 # preserved in the factor scores output
1 NA
2 -0.92604879
3 NA
4 0.10731539
5 -0.24370504
6 0.07357697
7 0.69905895
8 -0.17646575
9 1.58430095
10 -0.35934769
11 NA
12 1.07671299
13 -1.47487960
14 -0.30235156
15 -0.05816682
-------------------------------------
David L Carlson
Associate Professor of Anthropology
Texas A&M University
College Station, TX 77840-4352
----Original Message-----
From: r-help-bounces at r-project.org
[mailto:r-help-bounces at r-project.org] On Behalf Of PIKAL Petr
Sent: Friday, July 26, 2013 9:06 AM
To: s00123776 at myacu.edu.au; 'Justin Delahunty';
r-help at r-project.org
Subject: Re: [R] Maintaining data order in factanal with
missing data
Hi
There are probably better options but
merge(data.frame(x=1:154),data.frame(x=names(ab.1.fa[[1]]),
y=ab.1.fa[[1]]), all.x=T)
gives you data frame with NA when there was missing value in
the first data.frame.
You probably can automate the process a bit with nrow
function.
Regards
Petr
> -----Original Message-----
> From: Justin Delahunty [mailto:ACU at genius.net.au]
> Sent: Friday, July 26, 2013 3:34 PM
> To: PIKAL Petr; 'Justin Delahunty'; 'Justin Delahunty';
r-help at r-
> project.org
> Subject: RE: [R] Maintaining data order in factanal with
missing data
>
> Hi Petr,
>
> So sorry, I accidentally attached the complete data set
rather than the
> one with missing values. I've attached the correct file to
this email.
> RE:
> init.dfs() being local, I hadn't even thought of that. I've
been away
> from OOP for close to 15 years now, so it might be time to
revise!
>
> The problem I have is that with missing values the list of
factor
> scores returned (ab.w1.fa$factor.scores) does not map onto
the
> originating data frame (ab.w1.df) as it no longer includes
the cases
> which had missing values. So while the original data set for
ab.w1.df
> contains 154 ordered cases, the factor analysis contains
only 150.
>
> I am seeking a way to map the values derived from the factor
analysis
> (ab.w1.fa$factor.scores) back to their original ordered
position, so
> that these factor score variables may be merged back into
the master
> data frame (ab.df). A unique ID for each case is available
($dmid)
> which I had thought to use when merging the new variables,
however I
> don't know how to implement this.
>
>
> Thanks for your help,
>
> Justin
>
>
> -----Original Message-----
> From: PIKAL Petr [mailto:petr.pikal at precheza.cz]
> Sent: Friday, 26 July 2013 10:59 PM
> To: Justin Delahunty; Justin Delahunty; r-help at r-project.org
> Subject: RE: [R] Maintaining data order in factanal with
missing data
>
> Hi
>
> Well, the function init.dfs does nothing as all data frames
created
> inside it does not propagate to global environment and there
is nothing
> what the function returns.
>
> Tha last line (when used outside a function) gives warnings
but there
> is no sign of error.
>
> When
>
> > head(ab.1.df)
> dmid g5oab2 g53 g54 g55 g5ovb1
> 1 1 1.418932 1.805227 2.791152 3.624116 3.425586
> 2 2 2.293907 1.187830 1.611237 1.748526 3.816533
> 3 3 2.836536 2.679523 1.279639 2.674986 2.452395
> 4 4 1.872259 3.278359 1.785872 2.458315 1.146480
> 5 5 1.467195 1.180747 3.564127 3.007682 2.109506
> 6 6 3.098512 3.151974 3.969379 3.750571 1.497358
> > head(ab.2.df)
> dmid w2oab3 w22 w23 w24 w2ovb1
> 1 1 4.831362 5.522764 7.809366 6.969172 7.398385
> 2 2 6.706346 4.101742 1.434697 5.266775 5.357641
> 3 3 3.653806 2.666885 1.209326 5.125556 4.963374
> 4 4 7.221255 7.649152 6.540398 6.648506 2.576081
> 5 5 1.848023 5.044314 2.761881 3.307220 1.454234
> 6 6 7.606429 4.911766 2.034813 2.638573 2.818834
> > head(ab.3.df)
> dmid w3oab3 w3oab4 w3oab7 w3oab8 w3ovb1
> 1 1 5.835609 6.108220 6.587721 2.451461 2.785467
> 2 2 4.973198 1.196815 6.388056 1.110877 4.226463
> 3 3 3.800367 6.697287 5.235345 6.666829 6.319073
> 4 4 1.093141 1.477773 2.269252 3.194978 4.916342
> 5 5 1.975060 7.204516 4.825435 1.775874 3.484027
> 6 6 3.273361 2.243805 5.326547 5.720892 6.118723
> >
>
> > str(ab.1.fa)
> List of 2
> $ rescaled.scores: Named num [1:154] 3.43 3.83 2.43 1.1
2.08 ...
> ..- attr(*, "names")= chr [1:154] "1" "2" "3" "4" ...
> $ factor.loadings: Named num [1:5] -0.0106 -0.0227 -0.1093
-0.0912
> 0.9975
> ..- attr(*, "names")= chr [1:5] "g5oab2" "g53" "g54" "g55"
...
> > str(ab.2.fa)
> List of 2
> $ rescaled.scores: Named num [1:154] 6.34 5.24 5.3 1.91
2.16 ...
> ..- attr(*, "names")= chr [1:154] "1" "2" "3" "4" ...
> $ factor.loadings: Named num [1:5] -0.2042 0.0063 -0.2287
-0.0119
> 0.7138
> ..- attr(*, "names")= chr [1:5] "w2oab3" "w22" "w23" "w24"
...
> > str(ab.3.fa)
> List of 2
> $ rescaled.scores: Named num [1:154] NaN NaN NaN NaN NaN
NaN NaN NaN
> NaN NaN ...
> ..- attr(*, "names")= chr [1:154] "1" "2" "3" "4" ...
> $ factor.loadings: Named num [1:5] -0.1172 0.0128 -0.0968
0.106 0.9975
> ..- attr(*, "names")= chr [1:5] "w3oab3" "w3oab4" "w3oab7"
"w3oab8"
> ...
>
> Anyway I have no idea what you consider wrong?
>
> Regards
> Petr
>
>
>
> > -----Original Message-----
> > From: Justin Delahunty [mailto:ACU at genius.net.au]
> > Sent: Friday, July 26, 2013 2:22 PM
> > To: PIKAL Petr; 'Justin Delahunty'; r-help at r-project.org
> > Subject: RE: [R] Maintaining data order in factanal with
missing data
> >
> > Hi Petr,
> >
> > Thanks for the quick response. Unfortunately I cannot
share the data
> I
> > am working with, however please find attached a suitable R
workspace
> > with generated data. It has the appropriate variable
names, only the
> > data has been changed.
> >
> > The last function in the list (init.dfs()) I call to
subset the
> > overall data set into the three waves, then conduct the
factor
> > analysis on each
> > (1 factor CFA); it's just in a function to ease re-typing
in a new
> > workspace.
> >
> >
> > Thanks,
> >
> > Justin
> >
> > -----Original Message-----
> > From: PIKAL Petr [mailto:petr.pikal at precheza.cz]
> > Sent: Friday, 26 July 2013 7:35 PM
> > To: Justin Delahunty; r-help at r-project.org
> > Subject: RE: [R] Maintaining data order in factanal with
missing data
> >
> > Hi
> >
> > You provided functions, so far so good. But without data
it would be
> > quite difficult to understand what the functions do and
where could
> be
> > the issue.
> >
> > I suspect combination of complete cases selection together
with
> subset
> > and factor behaviour. But I can be completely out of
target too.
> >
> > Petr
> >
> > > -----Original Message-----
> > > From: r-help-bounces at r-project.org
[mailto:r-help-bounces at r-
> > > project.org] On Behalf Of s00123776 at myacu.edu.au
> > > Sent: Friday, July 26, 2013 9:35 AM
> > > To: r-help at r-project.org
> > > Subject: [R] Maintaining data order in factanal with
missing data
> > >
> > > Hi,
> > >
> > >
> > >
> > > I'm new to R, so sorry if this is a simple answer. I'm
currently
> > > trying to collapse some ordinal variables into a
composite; the
> > > program ideally should take a data frame as input,
perform a factor
> > > analysis, compute factor scores, sds, etc., and return
the rescaled
> > > scores and loadings. The difficulty I'm having is that
my data set
> > > contains a number of NA, which I am excluding from the
analysis
> > > using complete.cases(), and thus the incomplete cases
are
> "skipped".
> > > These functions are for a longitudinal data set with
repeated waves
> > > of
> > data,
> > > so the final rescaled scores from each wave need to be
saved as
> > > variables grouped by a unique ID (DMID). The functions
I'm trying
> to
> > > implement are as follows:
> > >
> > >
> > >
> > > weighted.sd<-function(x,w){
> > >
> > > sum.w<-sum(w)
> > >
> > > sum.w2<-sum(w^2)
> > >
> > > mean.w<-sum(x*w)/sum(w)
> > >
> > >
> > >
x.sd.w<-sqrt((sum.w/(sum.w^2-sum.w2))*sum(w*(x-mean.w)^2))
> > >
> > > return(x.sd.w)
> > >
> > > }
> > >
> > >
> > >
> > > re.scale<-function(f.scores, raw.data, loadings){
> > >
> > >
> > > fz.scores<-(f.scores+mean(f.scores))/(sd(f.scores))
> > >
> > >
> > > means<-apply(raw.data,1,weighted.mean,w=loadings)
> > >
> > >
> > > sds<-apply(raw.data,1,weighted.sd,w=loadings)
> > >
> > > grand.mean<-mean(means)
> > >
> > > grand.sd<-mean(sds)
> > >
> > >
> > > final.scores<-((fz.scores*grand.sd)+grand.mean)
> > >
> > > return(final.scores)
> > >
> > > }
> > >
> > >
> > >
> > > get.scores<-function(data){
> > >
> > >
> > > fact<-
> > >
factanal(data[complete.cases(data),],factors=1,scores="regress
ion")
> > >
> > >
f.scores<-fact$scores[,1]
> > >
> > >
f.loads<-fact$loadings[,1]
> > >
> > >
rescaled.scores<-re.scale(f.scores,
> > > data[complete.cases(data),], f.loads)
> > >
> > >
output.list<-list(rescaled.scores,
> > > f.loads)
> > >
> > > names(output.list)<-
> > > c("rescaled.scores",
> > > "factor.loadings")
> > >
> > > return(output.list)
> > >
> > > }
> > >
> > >
> > >
> > > init.dfs<-function(){
> > >
> > >
> > > ab.1.df<-subset(ab.df,,select=c(dmid,g5oab2:g5ovb1))
> > >
> > >
> > > ab.2.df<-subset(ab.df,,select=c(dmid,w2oab3:w2ovb1))
> > >
> > >
> > > ab.3.df<-subset(ab.df,,select=c(dmid,
> > > w3oab3, w3oab4, w3oab7, w3oab8, w3ovb1))
> > >
> > >
> > >
> > >
ab.1.fa<-get.scores(ab.1.df[-1])
> > >
> > >
ab.2.fa<-get.scores(ab.2.df[-1])
> > >
> > >
ab.3.fa<-get.scores(ab.3.df[-1])
> > >
> > >
> > > }
> > >
> > >
> > >
> > > Thanks for your help,
> > >
> > >
> > >
> > > Justin
> > >
> > >
> > > [[alternative HTML version deleted]]
> > >
> > > ______________________________________________
> > > R-help at r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
http://www.R-project.org/posting-
> > > guide.html and provide commented, minimal,
self-contained,
> > > reproducible code.
> >
>
>
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible
code.
More information about the R-help
mailing list