[R] How to use PC1 of PCA and dim1 of MCA as a predictor in logistic regression model for data reduction
khosoda at med.kobe-u.ac.jp
khosoda at med.kobe-u.ac.jp
Wed Aug 17 17:12:10 CEST 2011
Hi all,
I'm trying to do model reduction for logistic regression. I have 13
predictor (4 continuous variables and 9 binary variables). Using subject
matter knowledge, I selected 4 important variables. Regarding the rest 9
variables, I tried to perform data reduction by principal component
analysis (PCA). However, 8 of 9 variables were binary and only one
continuous. I transformed the data by transcan of rms package and did
PCA with princomp. PC1 explained only 20% of the variance. Still, I used
the PC1 as a predictor of the logistic model and obtained some results.
Then, I tried multiple correspondence analysis (MCA). The only one
continuous variable was age. I transformed "age" variable to "age_Q"
factor variable as the followings.
> quantile(mydata.df$age)
0% 25% 50% 75% 100%
53.00 66.75 72.00 76.25 85.00
> age_Q <- cut(x17.df$age, right=TRUE, breaks=c(-Inf, 66, 72, 76, Inf),
labels=c("53-66", "67-72", "73-76", "77-85"))
> table(age_Q)
age_Q
53-66 67-72 73-76 77-85
26 27 25 26
Then, I used mjca of ca pacakge for MCA.
> mjca1 <- mjca(mydata.df[, c("age_Q","sex","symptom", "HT", "DM",
"IHD","smoking","DL", "Statin")])
> summary(mjca1)
Principal inertias (eigenvalues):
dim value % cum% scree plot
1 0.009592 43.4 43.4 *************************
2 0.003983 18.0 61.4 **********
3 0.001047 4.7 66.1 **
4 0.000367 1.7 67.8
-------- -----
Total: 0.022111
The dimension 1 explained 43% of the variance. Then, I was wondering
which values I could use like PC1 in PCA. I explored in mjca1 and found
"rowcoord".
> mjca1$rowcoord
[,1] [,2] [,3] [,4]
[1,] 0.07403748 0.8963482181 0.10828273 1.581381849
[2,] 0.92433996 -1.1497911361 1.28872517 0.304065865
[3,] 0.49833354 0.6482940556 -2.11114314 0.365023261
[4,] 0.18998290 -1.4028117048 -1.70962159 0.451951744
[5,] -0.13008173 0.2557656854 1.16561601 -1.012992485
.........................................................
.........................................................
[101,] -1.86940216 0.5918128751 0.87352987 -1.118865117
[102,] -2.19096615 1.2845448725 0.25227354 -0.938612155
[103,] 0.77981265 -1.1931087587 0.23934034 0.627601413
[104,] -2.37058237 -1.4014005013 -0.73578248 -1.455055095
Then, I used mjca1$rowcoord[, 1] as the followings.
> mydata.df$NewScore <- mjca1$rowcoord[, 1]
I used this "NewScore" as one of the predictors for the model instead of
original 9 variables.
The final logistic model obtained by use of MCA was similar to the one
obtained by use of PCA.
My questions are;
1. Is it O.K. to perform PCA for data consisting of 1 continuous
variable and 8 binary variables?
2. Is it O.K to perform transformation of age from continuous variable
to factor variable for MCA?
3. Is "mjca1$rowcoord[, 1]" the correct values as a predictor of
logistic regression model like PC1 of PCA?
I would appreciate your help in advance.
--
Kohkichi Hosoda
More information about the R-help
mailing list