[R] Collinearity? Cannot get logisticRidge{ridge} to work

Kengo Inagaki kengoing.gj at gmail.com
Wed May 27 23:49:03 CEST 2015


Thank you very much for your rapid response. I sincerely appreciate your input.
I am sorry for sending the previous email in HTML format.

with(a,  table(Sex, Therapy1) )   shows the following.
          Therapy1
Sex      no yes
  female  6   7
  male    7   5

with(a,  table(Sex, Outcome) ) and with(a,  table(Therapy1, Outcome) )
elicit the following

        Outcome
Sex      Alive Death
  female     4     9
  male       9     3

        Outcome
Therapy1 Alive Death
     no      4     9
     yes     9     3

As there is no zero cells, it does not seem to be complete separation.
I really appreciate comments.

Kengo Inagaki
Memphis, TN


2015-05-27 13:57 GMT-05:00 David Winsemius <dwinsemius at comcast.net>:
>
> On May 27, 2015, at 10:10 AM, Kengo Inagaki wrote:
>
>> I am currently working on a health care related project using R. I am
>> learning R while working on data analysis.
>>
>> Below is the part of the data in which i am encountering a problem.
>>
>>
>> Case#    Sex         Therapy1             Therapy2             Outcome
>>
>> 1              male      no
>> no                           Alive
>>
>
> snipped mangled data sent in HTML
>
>>
>>
>> "Outcome" is the response variable and "Sex", "Therapy1", "Therapy2" are
>> predictor variables.
>>
>> All of the predictors are significantly associated with the outcome by
>> univariate analysis.
>>
>> Logistic regression runs fine with most of the predictors when "Sex" and
>> "Therapy1" are not included at the same time (This is a part of table that
>> I cut out from a larger table for ease of
>>
>> presentation and there are more predictors that i tested).
>
> Please examine the data before reaching for ridge regression:
>
> What does this show: ...
>
>     with(a,  table(Sex, Therapy1) )
>
> I predict you will see a zero cell entry. The read about "complete separation" and the so-called "Hauck-Donner effect".
>
> --
> David.
>>
>> However, when "Sex" and "Therapy1" are included in logistic regression
>> model at the same time, standard error inflates and p value gets close to 1.
>>
>> The formula used is,
>>
>>
>>
>>> Model<-glm(Outcome~Sex+Therapy1,data=a,family=binomial) #I assigned a
>> vector "a" to represent above table.
>>
>>
>>
>> After doing some reading, I suspect this might be collinearity, as vif
>> values (using "vif()" function in car package) were sky high (8,875,841 for
>> both "Sex" and "Therapy1").
>>
>> Learning that ridge regression may be a solution, I attempted using
>> logisticRidge {ridge} using the following formula, but i get the
>> accomapnying error message.
>>
>>
>>
>>> logisticRidge(a$Outcome~a$Sex+a$Therapy1)
>>
>>
>>
>> Error in ifelse(y, log(p), log(1 - p)) :
>>
>>  invalid to change the storage mode of a factor
>>
>>
>>
>> At this point I do not have an idea how to solve this and would like to
>> seek help.
>>
>> I really really appreciate your input!!!
>>
>>       [[alternative HTML version deleted]]
>>
>
>
> David Winsemius
> Alameda, CA, USA
>



More information about the R-help mailing list