[R] Collinearity? Cannot get logisticRidge{ridge} to work
David Winsemius
dwinsemius at comcast.net
Thu May 28 00:03:28 CEST 2015
On May 27, 2015, at 3:00 PM, Kengo Inagaki wrote:
> Here is the result-
>
>> with(a, table(Sex, Therapy1, Outcome) )
> , , Outcome = Alive
>
> Therapy1
> Sex no yes
> female 0 4
> male 4 5
>
> , , Outcome = Death
>
> Therapy1
> Sex no yes
> female 6 3
> male 3 0
So no deaths when Female had no-Therapy1 and no survivors with the opposite for those variables. Complete separation.
--
David.
>
>
> 2015-05-27 16:57 GMT-05:00 David Winsemius <dwinsemius at comcast.net>:
>>
>> On May 27, 2015, at 2:49 PM, Kengo Inagaki wrote:
>>
>>> Thank you very much for your rapid response. I sincerely appreciate your input.
>>> I am sorry for sending the previous email in HTML format.
>>>
>>> with(a, table(Sex, Therapy1) ) shows the following.
>>> Therapy1
>>> Sex no yes
>>> female 6 7
>>> male 7 5
>>>
>>> and with(a, table(Therapy1, Outcome) )
>>> elicit the following
>>>
>>> Outcome
>>> Sex Alive Death
>>> female 4 9
>>> male 9 3
>>>
>>> Outcome
>>> Therapy1 Alive Death
>>> no 4 9
>>> yes 9 3
>>
>> Then what about:
>>
>> with(a, table(Sex, Therapy1, Outcome) )
>>
>> --
>> David
>>
>>
>>>
>>> As there is no zero cells, it does not seem to be complete separation.
>>> I really appreciate comments.
>>>
>>> Kengo Inagaki
>>> Memphis, TN
>>>
>>>
>>> 2015-05-27 13:57 GMT-05:00 David Winsemius <dwinsemius at comcast.net>:
>>>>
>>>> On May 27, 2015, at 10:10 AM, Kengo Inagaki wrote:
>>>>
>>>>> I am currently working on a health care related project using R. I am
>>>>> learning R while working on data analysis.
>>>>>
>>>>> Below is the part of the data in which i am encountering a problem.
>>>>>
>>>>>
>>>>> Case# Sex Therapy1 Therapy2 Outcome
>>>>>
>>>>> 1 male no
>>>>> no Alive
>>>>>
>>>>
>>>> snipped mangled data sent in HTML
>>>>
>>>>>
>>>>>
>>>>> "Outcome" is the response variable and "Sex", "Therapy1", "Therapy2" are
>>>>> predictor variables.
>>>>>
>>>>> All of the predictors are significantly associated with the outcome by
>>>>> univariate analysis.
>>>>>
>>>>> Logistic regression runs fine with most of the predictors when "Sex" and
>>>>> "Therapy1" are not included at the same time (This is a part of table that
>>>>> I cut out from a larger table for ease of
>>>>>
>>>>> presentation and there are more predictors that i tested).
>>>>
>>>> Please examine the data before reaching for ridge regression:
>>>>
>>>> What does this show: ...
>>>>
>>>> with(a, table(Sex, Therapy1) )
>>>>
>>>> I predict you will see a zero cell entry. The read about "complete separation" and the so-called "Hauck-Donner effect".
>>>>
>>>> --
>>>> David.
>>>>>
>>>>> However, when "Sex" and "Therapy1" are included in logistic regression
>>>>> model at the same time, standard error inflates and p value gets close to 1.
>>>>>
>>>>> The formula used is,
>>>>>
>>>>>
>>>>>
>>>>>> Model<-glm(Outcome~Sex+Therapy1,data=a,family=binomial) #I assigned a
>>>>> vector "a" to represent above table.
>>>>>
>>>>>
>>>>>
>>>>> After doing some reading, I suspect this might be collinearity, as vif
>>>>> values (using "vif()" function in car package) were sky high (8,875,841 for
>>>>> both "Sex" and "Therapy1").
>>>>>
>>>>> Learning that ridge regression may be a solution, I attempted using
>>>>> logisticRidge {ridge} using the following formula, but i get the
>>>>> accomapnying error message.
>>>>>
>>>>>
>>>>>
>>>>>> logisticRidge(a$Outcome~a$Sex+a$Therapy1)
>>>>>
>>>>>
>>>>>
>>>>> Error in ifelse(y, log(p), log(1 - p)) :
>>>>>
>>>>> invalid to change the storage mode of a factor
>>>>>
>>>>>
>>>>>
>>>>> At this point I do not have an idea how to solve this and would like to
>>>>> seek help.
>>>>>
>>>>> I really really appreciate your input!!!
>>>>>
>>>>> [[alternative HTML version deleted]]
>>>>>
>>>>
>>>>
>>>> David Winsemius
>>>> Alameda, CA, USA
>>>>
>>
>> David Winsemius
>> Alameda, CA, USA
>>
David Winsemius
Alameda, CA, USA
More information about the R-help
mailing list