[R] Collinearity? Cannot get logisticRidge{ridge} to work

Kengo Inagaki kengoing.gj at gmail.com
Thu May 28 00:06:01 CEST 2015


I did not understand complete separation quite well..
Thank you very much for clarification.

Kengo

2015-05-27 17:03 GMT-05:00 David Winsemius <dwinsemius at comcast.net>:
>
> On May 27, 2015, at 3:00 PM, Kengo Inagaki wrote:
>
>> Here is the result-
>>
>>> with(a,  table(Sex, Therapy1,  Outcome) )
>> , , Outcome = Alive
>>
>>        Therapy1
>> Sex      no yes
>>  female  0   4
>>  male    4   5
>>
>> , , Outcome = Death
>>
>>        Therapy1
>> Sex      no yes
>>  female  6   3
>>  male    3   0
>
> So no deaths when Female had no-Therapy1 and no survivors with the opposite for those variables. Complete separation.
>
> --
> David.
>
>>
>>
>> 2015-05-27 16:57 GMT-05:00 David Winsemius <dwinsemius at comcast.net>:
>>>
>>> On May 27, 2015, at 2:49 PM, Kengo Inagaki wrote:
>>>
>>>> Thank you very much for your rapid response. I sincerely appreciate your input.
>>>> I am sorry for sending the previous email in HTML format.
>>>>
>>>> with(a,  table(Sex, Therapy1) )   shows the following.
>>>>         Therapy1
>>>> Sex      no yes
>>>> female  6   7
>>>> male    7   5
>>>>
>>>> and with(a,  table(Therapy1, Outcome) )
>>>> elicit the following
>>>>
>>>>       Outcome
>>>> Sex      Alive Death
>>>> female     4     9
>>>> male       9     3
>>>>
>>>>       Outcome
>>>> Therapy1 Alive Death
>>>>    no      4     9
>>>>    yes     9     3
>>>
>>> Then what about:
>>>
>>> with(a,  table(Sex, Therapy1,  Outcome) )
>>>
>>> --
>>> David
>>>
>>>
>>>>
>>>> As there is no zero cells, it does not seem to be complete separation.
>>>> I really appreciate comments.
>>>>
>>>> Kengo Inagaki
>>>> Memphis, TN
>>>>
>>>>
>>>> 2015-05-27 13:57 GMT-05:00 David Winsemius <dwinsemius at comcast.net>:
>>>>>
>>>>> On May 27, 2015, at 10:10 AM, Kengo Inagaki wrote:
>>>>>
>>>>>> I am currently working on a health care related project using R. I am
>>>>>> learning R while working on data analysis.
>>>>>>
>>>>>> Below is the part of the data in which i am encountering a problem.
>>>>>>
>>>>>>
>>>>>> Case#    Sex         Therapy1             Therapy2             Outcome
>>>>>>
>>>>>> 1              male      no
>>>>>> no                           Alive
>>>>>>
>>>>>
>>>>> snipped mangled data sent in HTML
>>>>>
>>>>>>
>>>>>>
>>>>>> "Outcome" is the response variable and "Sex", "Therapy1", "Therapy2" are
>>>>>> predictor variables.
>>>>>>
>>>>>> All of the predictors are significantly associated with the outcome by
>>>>>> univariate analysis.
>>>>>>
>>>>>> Logistic regression runs fine with most of the predictors when "Sex" and
>>>>>> "Therapy1" are not included at the same time (This is a part of table that
>>>>>> I cut out from a larger table for ease of
>>>>>>
>>>>>> presentation and there are more predictors that i tested).
>>>>>
>>>>> Please examine the data before reaching for ridge regression:
>>>>>
>>>>> What does this show: ...
>>>>>
>>>>>   with(a,  table(Sex, Therapy1) )
>>>>>
>>>>> I predict you will see a zero cell entry. The read about "complete separation" and the so-called "Hauck-Donner effect".
>>>>>
>>>>> --
>>>>> David.
>>>>>>
>>>>>> However, when "Sex" and "Therapy1" are included in logistic regression
>>>>>> model at the same time, standard error inflates and p value gets close to 1.
>>>>>>
>>>>>> The formula used is,
>>>>>>
>>>>>>
>>>>>>
>>>>>>> Model<-glm(Outcome~Sex+Therapy1,data=a,family=binomial) #I assigned a
>>>>>> vector "a" to represent above table.
>>>>>>
>>>>>>
>>>>>>
>>>>>> After doing some reading, I suspect this might be collinearity, as vif
>>>>>> values (using "vif()" function in car package) were sky high (8,875,841 for
>>>>>> both "Sex" and "Therapy1").
>>>>>>
>>>>>> Learning that ridge regression may be a solution, I attempted using
>>>>>> logisticRidge {ridge} using the following formula, but i get the
>>>>>> accomapnying error message.
>>>>>>
>>>>>>
>>>>>>
>>>>>>> logisticRidge(a$Outcome~a$Sex+a$Therapy1)
>>>>>>
>>>>>>
>>>>>>
>>>>>> Error in ifelse(y, log(p), log(1 - p)) :
>>>>>>
>>>>>> invalid to change the storage mode of a factor
>>>>>>
>>>>>>
>>>>>>
>>>>>> At this point I do not have an idea how to solve this and would like to
>>>>>> seek help.
>>>>>>
>>>>>> I really really appreciate your input!!!
>>>>>>
>>>>>>     [[alternative HTML version deleted]]
>>>>>>
>>>>>
>>>>>
>>>>> David Winsemius
>>>>> Alameda, CA, USA
>>>>>
>>>
>>> David Winsemius
>>> Alameda, CA, USA
>>>
>
> David Winsemius
> Alameda, CA, USA
>



More information about the R-help mailing list