[R] Collinearity? Cannot get logisticRidge{ridge} to work

Kengo Inagaki kengoing.gj at gmail.com
Thu May 28 00:00:22 CEST 2015


Here is the result-

> with(a,  table(Sex, Therapy1,  Outcome) )
, , Outcome = Alive

        Therapy1
Sex      no yes
  female  0   4
  male    4   5

, , Outcome = Death

        Therapy1
Sex      no yes
  female  6   3
  male    3   0


2015-05-27 16:57 GMT-05:00 David Winsemius <dwinsemius at comcast.net>:
>
> On May 27, 2015, at 2:49 PM, Kengo Inagaki wrote:
>
>> Thank you very much for your rapid response. I sincerely appreciate your input.
>> I am sorry for sending the previous email in HTML format.
>>
>> with(a,  table(Sex, Therapy1) )   shows the following.
>>          Therapy1
>> Sex      no yes
>>  female  6   7
>>  male    7   5
>>
>>  and with(a,  table(Therapy1, Outcome) )
>> elicit the following
>>
>>        Outcome
>> Sex      Alive Death
>>  female     4     9
>>  male       9     3
>>
>>        Outcome
>> Therapy1 Alive Death
>>     no      4     9
>>     yes     9     3
>
> Then what about:
>
> with(a,  table(Sex, Therapy1,  Outcome) )
>
> --
> David
>
>
>>
>> As there is no zero cells, it does not seem to be complete separation.
>> I really appreciate comments.
>>
>> Kengo Inagaki
>> Memphis, TN
>>
>>
>> 2015-05-27 13:57 GMT-05:00 David Winsemius <dwinsemius at comcast.net>:
>>>
>>> On May 27, 2015, at 10:10 AM, Kengo Inagaki wrote:
>>>
>>>> I am currently working on a health care related project using R. I am
>>>> learning R while working on data analysis.
>>>>
>>>> Below is the part of the data in which i am encountering a problem.
>>>>
>>>>
>>>> Case#    Sex         Therapy1             Therapy2             Outcome
>>>>
>>>> 1              male      no
>>>> no                           Alive
>>>>
>>>
>>> snipped mangled data sent in HTML
>>>
>>>>
>>>>
>>>> "Outcome" is the response variable and "Sex", "Therapy1", "Therapy2" are
>>>> predictor variables.
>>>>
>>>> All of the predictors are significantly associated with the outcome by
>>>> univariate analysis.
>>>>
>>>> Logistic regression runs fine with most of the predictors when "Sex" and
>>>> "Therapy1" are not included at the same time (This is a part of table that
>>>> I cut out from a larger table for ease of
>>>>
>>>> presentation and there are more predictors that i tested).
>>>
>>> Please examine the data before reaching for ridge regression:
>>>
>>> What does this show: ...
>>>
>>>    with(a,  table(Sex, Therapy1) )
>>>
>>> I predict you will see a zero cell entry. The read about "complete separation" and the so-called "Hauck-Donner effect".
>>>
>>> --
>>> David.
>>>>
>>>> However, when "Sex" and "Therapy1" are included in logistic regression
>>>> model at the same time, standard error inflates and p value gets close to 1.
>>>>
>>>> The formula used is,
>>>>
>>>>
>>>>
>>>>> Model<-glm(Outcome~Sex+Therapy1,data=a,family=binomial) #I assigned a
>>>> vector "a" to represent above table.
>>>>
>>>>
>>>>
>>>> After doing some reading, I suspect this might be collinearity, as vif
>>>> values (using "vif()" function in car package) were sky high (8,875,841 for
>>>> both "Sex" and "Therapy1").
>>>>
>>>> Learning that ridge regression may be a solution, I attempted using
>>>> logisticRidge {ridge} using the following formula, but i get the
>>>> accomapnying error message.
>>>>
>>>>
>>>>
>>>>> logisticRidge(a$Outcome~a$Sex+a$Therapy1)
>>>>
>>>>
>>>>
>>>> Error in ifelse(y, log(p), log(1 - p)) :
>>>>
>>>> invalid to change the storage mode of a factor
>>>>
>>>>
>>>>
>>>> At this point I do not have an idea how to solve this and would like to
>>>> seek help.
>>>>
>>>> I really really appreciate your input!!!
>>>>
>>>>      [[alternative HTML version deleted]]
>>>>
>>>
>>>
>>> David Winsemius
>>> Alameda, CA, USA
>>>
>
> David Winsemius
> Alameda, CA, USA
>



More information about the R-help mailing list