[R] Collinearity? Cannot get logisticRidge{ridge} to work

Kengo Inagaki kengoing.gj at gmail.com
Wed May 27 19:10:29 CEST 2015


I am currently working on a health care related project using R. I am
learning R while working on data analysis.

Below is the part of the data in which i am encountering a problem.



Case#    Sex         Therapy1             Therapy2             Outcome

1              male      no
no                           Alive

2              female  no
no                           Death

3              male      no
no                           Alive

4              female  no
no                           Death

5              male      no
no                           Death

6              male      no
no                           Alive

7              male      yes
no                           Alive

8              female  no
no                           Death

9              male      no
yes                         Alive

10           female  no
no                           Death

11           female  yes
yes                         Death

12           female  yes
no                           Death

13           female  yes
no                           Death

14           female  yes
no                           Alive

15           male      yes
no                           Alive

16           male      yes
no                           Alive

17           male      no
yes                         Death

18           male      no
yes                         Death

19           male      yes
no                           Alive

20           female  no
yes                         Death

21           female  yes
no                           Alive

22           female  no
yes                         Death

23           male      yes
no                           Alive

24           female  yes
no                           Alive

25           female  yes
no                           Alive



"Outcome" is the response variable and "Sex", "Therapy1", "Therapy2" are
predictor variables.

All of the predictors are significantly associated with the outcome by
univariate analysis.

Logistic regression runs fine with most of the predictors when "Sex" and
"Therapy1" are not included at the same time (This is a part of table that
I cut out from a larger table for ease of

presentation and there are more predictors that i tested).

However, when "Sex" and "Therapy1" are included in logistic regression
model at the same time, standard error inflates and p value gets close to 1.

The formula used is,



>Model<-glm(Outcome~Sex+Therapy1,data=a,family=binomial) #I assigned a
vector "a" to represent above table.



After doing some reading, I suspect this might be collinearity, as vif
values (using "vif()" function in car package) were sky high (8,875,841 for
both "Sex" and "Therapy1").

Learning that ridge regression may be a solution, I attempted using
logisticRidge {ridge} using the following formula, but i get the
accomapnying error message.



>logisticRidge(a$Outcome~a$Sex+a$Therapy1)



Error in ifelse(y, log(p), log(1 - p)) :

  invalid to change the storage mode of a factor



At this point I do not have an idea how to solve this and would like to
seek help.

I really really appreciate your input!!!

	[[alternative HTML version deleted]]



More information about the R-help mailing list