[R] How to use PC1 of PCA and dim1 of MCA as a predictor in logistic regression model for data reduction

khosoda at med.kobe-u.ac.jp khosoda at med.kobe-u.ac.jp
Thu Aug 18 14:37:17 CEST 2011


Dear Daniel,

Thank you for your mail.
Your comment is exactly what I was worried about.

I konw very little about latent class analysis. So, I would like to use 
multiple correspondence analysis (MCA) for data redution. Besides, the 
first plane of the MCA captured 43% of the variance.

Do you think my use of "mjca1$rowcoord[, 1]" in ca package for data 
reduction in the previous mail is O.K.?

Thank you for your help.

--
Kohkichi Hosoda

(11/08/18 17:39), Daniel Malter wrote:
> Pooling nominal with numeric variables and running pca on them sounds like
> conceptual nonsense to me. You use PCA to reduce the dimensionality of the
> data if the data are numeric. For categorical data analysis, you should use
> latent class analysis or something along those lines.
>
> The fact that your first PC captures only 20 percent of the variance
> indicates that either you apply the wrong technique or that dimensionality
> reduction is of little use for these data more generally. The first step
> should generally be to check the correlations/associations between the
> variables to inspect whether what you intend to do makes sense.
>
> HTH,
> Daniel
>
>
>
> khosoda wrote:
>>
>> Hi all,
>>
>> I'm trying to do model reduction for logistic regression. I have 13
>> predictor (4 continuous variables and 9 binary variables). Using subject
>> matter knowledge, I selected 4 important variables. Regarding the rest 9
>> variables, I tried to perform data reduction by principal component
>> analysis (PCA). However, 8 of 9 variables were binary and only one
>> continuous. I transformed the data by transcan of rms package and did
>> PCA with princomp. PC1 explained only 20% of the variance. Still, I used
>> the PC1 as a predictor of the logistic model and obtained some results.
>>
>> Then, I tried multiple correspondence analysis (MCA). The only one
>> continuous variable was age. I transformed "age" variable to "age_Q"
>> factor variable as the followings.
>>
>>> quantile(mydata.df$age)
>>     0%   25%   50%   75%  100%
>> 53.00 66.75 72.00 76.25 85.00
>>> age_Q<- cut(x17.df$age, right=TRUE, breaks=c(-Inf, 66, 72, 76, Inf),
>> labels=c("53-66", "67-72", "73-76", "77-85"))
>>> table(age_Q)
>> age_Q
>> 53-66 67-72 73-76 77-85
>>     26    27    25    26
>>
>> Then, I used mjca of ca pacakge for MCA.
>>
>>> mjca1<-  mjca(mydata.df[, c("age_Q","sex","symptom", "HT", "DM",
>> "IHD","smoking","DL", "Statin")])
>>
>>> summary(mjca1)
>>
>> Principal inertias (eigenvalues):
>>
>>   dim    value      %   cum%   scree plot
>>   1      0.009592  43.4  43.4  *************************
>>   2      0.003983  18.0  61.4  **********
>>   3      0.001047   4.7  66.1  **
>>   4      0.000367   1.7  67.8
>>          -------- -----
>>   Total: 0.022111
>>
>> The dimension 1 explained 43% of the variance. Then, I was wondering
>> which values I could use like PC1 in PCA. I explored in mjca1 and found
>> "rowcoord".
>>
>>> mjca1$rowcoord
>>                [,1]          [,2]        [,3]         [,4]
>>    [1,]  0.07403748  0.8963482181  0.10828273  1.581381849
>>    [2,]  0.92433996 -1.1497911361  1.28872517  0.304065865
>>    [3,]  0.49833354  0.6482940556 -2.11114314  0.365023261
>>    [4,]  0.18998290 -1.4028117048 -1.70962159  0.451951744
>>    [5,] -0.13008173  0.2557656854  1.16561601 -1.012992485
>> .........................................................
>> .........................................................
>> [101,] -1.86940216  0.5918128751  0.87352987 -1.118865117
>> [102,] -2.19096615  1.2845448725  0.25227354 -0.938612155
>> [103,]  0.77981265 -1.1931087587  0.23934034  0.627601413
>> [104,] -2.37058237 -1.4014005013 -0.73578248 -1.455055095
>>
>> Then, I used mjca1$rowcoord[, 1] as the followings.
>>
>>> mydata.df$NewScore<- mjca1$rowcoord[, 1]
>>
>> I used this "NewScore" as one of the predictors for the model instead of
>> original 9 variables.
>>
>> The final logistic model obtained by use of MCA was similar to the one
>> obtained by use of PCA.
>>
>> My questions are;
>>
>> 1. Is it O.K. to perform PCA for data consisting of 1 continuous
>> variable and 8 binary variables?
>>
>> 2. Is it O.K to perform transformation of age from continuous variable
>> to factor variable for MCA?
>>
>> 3. Is "mjca1$rowcoord[, 1]" the correct values as a predictor of
>> logistic regression model like PC1 of PCA?
>>
>> I would appreciate your help in advance.
>>
>> --
>> Kohkichi Hosoda
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/How-to-use-PC1-of-PCA-and-dim1-of-MCA-as-a-predictor-in-logistic-regression-model-for-data-reduction-tp3750251p3752062.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list