[R] Collinearity? Cannot get logisticRidge{ridge} to work
Kengo Inagaki
kengoing.gj at gmail.com
Thu May 28 00:06:01 CEST 2015
I did not understand complete separation quite well..
Thank you very much for clarification.
Kengo
2015-05-27 17:03 GMT-05:00 David Winsemius <dwinsemius at comcast.net>:
>
> On May 27, 2015, at 3:00 PM, Kengo Inagaki wrote:
>
>> Here is the result-
>>
>>> with(a, table(Sex, Therapy1, Outcome) )
>> , , Outcome = Alive
>>
>> Therapy1
>> Sex no yes
>> female 0 4
>> male 4 5
>>
>> , , Outcome = Death
>>
>> Therapy1
>> Sex no yes
>> female 6 3
>> male 3 0
>
> So no deaths when Female had no-Therapy1 and no survivors with the opposite for those variables. Complete separation.
>
> --
> David.
>
>>
>>
>> 2015-05-27 16:57 GMT-05:00 David Winsemius <dwinsemius at comcast.net>:
>>>
>>> On May 27, 2015, at 2:49 PM, Kengo Inagaki wrote:
>>>
>>>> Thank you very much for your rapid response. I sincerely appreciate your input.
>>>> I am sorry for sending the previous email in HTML format.
>>>>
>>>> with(a, table(Sex, Therapy1) ) shows the following.
>>>> Therapy1
>>>> Sex no yes
>>>> female 6 7
>>>> male 7 5
>>>>
>>>> and with(a, table(Therapy1, Outcome) )
>>>> elicit the following
>>>>
>>>> Outcome
>>>> Sex Alive Death
>>>> female 4 9
>>>> male 9 3
>>>>
>>>> Outcome
>>>> Therapy1 Alive Death
>>>> no 4 9
>>>> yes 9 3
>>>
>>> Then what about:
>>>
>>> with(a, table(Sex, Therapy1, Outcome) )
>>>
>>> --
>>> David
>>>
>>>
>>>>
>>>> As there is no zero cells, it does not seem to be complete separation.
>>>> I really appreciate comments.
>>>>
>>>> Kengo Inagaki
>>>> Memphis, TN
>>>>
>>>>
>>>> 2015-05-27 13:57 GMT-05:00 David Winsemius <dwinsemius at comcast.net>:
>>>>>
>>>>> On May 27, 2015, at 10:10 AM, Kengo Inagaki wrote:
>>>>>
>>>>>> I am currently working on a health care related project using R. I am
>>>>>> learning R while working on data analysis.
>>>>>>
>>>>>> Below is the part of the data in which i am encountering a problem.
>>>>>>
>>>>>>
>>>>>> Case# Sex Therapy1 Therapy2 Outcome
>>>>>>
>>>>>> 1 male no
>>>>>> no Alive
>>>>>>
>>>>>
>>>>> snipped mangled data sent in HTML
>>>>>
>>>>>>
>>>>>>
>>>>>> "Outcome" is the response variable and "Sex", "Therapy1", "Therapy2" are
>>>>>> predictor variables.
>>>>>>
>>>>>> All of the predictors are significantly associated with the outcome by
>>>>>> univariate analysis.
>>>>>>
>>>>>> Logistic regression runs fine with most of the predictors when "Sex" and
>>>>>> "Therapy1" are not included at the same time (This is a part of table that
>>>>>> I cut out from a larger table for ease of
>>>>>>
>>>>>> presentation and there are more predictors that i tested).
>>>>>
>>>>> Please examine the data before reaching for ridge regression:
>>>>>
>>>>> What does this show: ...
>>>>>
>>>>> with(a, table(Sex, Therapy1) )
>>>>>
>>>>> I predict you will see a zero cell entry. The read about "complete separation" and the so-called "Hauck-Donner effect".
>>>>>
>>>>> --
>>>>> David.
>>>>>>
>>>>>> However, when "Sex" and "Therapy1" are included in logistic regression
>>>>>> model at the same time, standard error inflates and p value gets close to 1.
>>>>>>
>>>>>> The formula used is,
>>>>>>
>>>>>>
>>>>>>
>>>>>>> Model<-glm(Outcome~Sex+Therapy1,data=a,family=binomial) #I assigned a
>>>>>> vector "a" to represent above table.
>>>>>>
>>>>>>
>>>>>>
>>>>>> After doing some reading, I suspect this might be collinearity, as vif
>>>>>> values (using "vif()" function in car package) were sky high (8,875,841 for
>>>>>> both "Sex" and "Therapy1").
>>>>>>
>>>>>> Learning that ridge regression may be a solution, I attempted using
>>>>>> logisticRidge {ridge} using the following formula, but i get the
>>>>>> accomapnying error message.
>>>>>>
>>>>>>
>>>>>>
>>>>>>> logisticRidge(a$Outcome~a$Sex+a$Therapy1)
>>>>>>
>>>>>>
>>>>>>
>>>>>> Error in ifelse(y, log(p), log(1 - p)) :
>>>>>>
>>>>>> invalid to change the storage mode of a factor
>>>>>>
>>>>>>
>>>>>>
>>>>>> At this point I do not have an idea how to solve this and would like to
>>>>>> seek help.
>>>>>>
>>>>>> I really really appreciate your input!!!
>>>>>>
>>>>>> [[alternative HTML version deleted]]
>>>>>>
>>>>>
>>>>>
>>>>> David Winsemius
>>>>> Alameda, CA, USA
>>>>>
>>>
>>> David Winsemius
>>> Alameda, CA, USA
>>>
>
> David Winsemius
> Alameda, CA, USA
>
More information about the R-help
mailing list