[R] Dependent Variable in Logistic Regression
John Fox
j|ox @end|ng |rom mcm@@ter@c@
Sat Aug 1 21:01:29 CEST 2020
Dear Paul,
I think that this thread has gotten unnecessarily complicated. The
answer, as is easily demonstrated, is that a binary response for a
binomial GLM in glm() may be a factor, a numeric variable, or a logical
variable, with identical results; for example:
--------------- snip -------------
> set.seed(123)
> head(x <- rnorm(100))
[1] -0.56047565 -0.23017749 1.55870831 0.07050839 0.12928774 1.71506499
> head(y <- rbinom(100, 1, 1/(1 + exp(-x))))
[1] 0 1 1 1 1 0
> head(yf <- as.factor(y))
[1] 0 1 1 1 1 0
Levels: 0 1
> head(yl <- y == 1)
[1] FALSE TRUE TRUE TRUE TRUE FALSE
> glm(y ~ x, family=binomial)
Call: glm(formula = y ~ x, family = binomial)
Coefficients:
(Intercept) x
0.3995 1.1670
Degrees of Freedom: 99 Total (i.e. Null); 98 Residual
Null Deviance: 134.6
Residual Deviance: 114.9 AIC: 118.9
> glm(yf ~ x, family=binomial)
Call: glm(formula = yf ~ x, family = binomial)
Coefficients:
(Intercept) x
0.3995 1.1670
Degrees of Freedom: 99 Total (i.e. Null); 98 Residual
Null Deviance: 134.6
Residual Deviance: 114.9 AIC: 118.9
> glm(yl ~ x, family=binomial)
Call: glm(formula = yl ~ x, family = binomial)
Coefficients:
(Intercept) x
0.3995 1.1670
Degrees of Freedom: 99 Total (i.e. Null); 98 Residual
Null Deviance: 134.6
Residual Deviance: 114.9 AIC: 118.9
--------------- snip -------------
The original poster claimed to have encountered an error with a 0/1
numeric response, but didn't show any data or even a command. I suspect
that the response was a character variable, but of course can't really
know that.
Best,
John
John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://socialsciences.mcmaster.ca/jfox/
On 2020-08-01 2:25 p.m., Paul Bernal wrote:
> Dear friend,
>
> I am aware that I have a binomial dependent variable, which is covid status
> (1 if covid positive, and 0 otherwise).
>
> My question was if R requires to turn a binomial response variable into a
> factor or not, that's all.
>
> Cheers,
>
> Paul
>
> El sáb., 1 de agosto de 2020 1:22 p. m., Bert Gunter <bgunter.4567 using gmail.com>
> escribió:
>
>> ... yes, but so does lm() for a categorical **INdependent** variable with
>> more than 2 numerically labeled levels. n levels = (n-1) df for a
>> categorical covariate, but 1 for a continuous one (unless more complex
>> models are explicitly specified of course). As I said, the OP seems
>> confused about whether he is referring to the response or covariates. Or
>> maybe he just made the same typo I did.
>>
>> Bert Gunter
>>
>> "The trouble with having an open mind is that people keep coming along and
>> sticking things into it."
>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>
>>
>> On Sat, Aug 1, 2020 at 11:15 AM Patrick (Malone Quantitative) <
>> malone using malonequantitative.com> wrote:
>>
>>> No, R does not. glm() does in order to do logistic regression.
>>>
>>> On Sat, Aug 1, 2020 at 2:11 PM Paul Bernal <paulbernal07 using gmail.com>
>>> wrote:
>>>
>>>> Hi Bert,
>>>>
>>>> Thank you for the kind reply.
>>>>
>>>> But what if I don't turn the variable into a factor. Let's say that in
>>>> excel I just coded the variable as 1s and 0s and just imported the
>>>> dataset
>>>> into R and fitted the logistic regression without turning any categorical
>>>> variable or dummy variable into a factor?
>>>>
>>>> Does R requires every dummy variable to be treated as a factor?
>>>>
>>>> Best regards,
>>>>
>>>> Paul
>>>>
>>>> El sáb., 1 de agosto de 2020 12:59 p. m., Bert Gunter <
>>>> bgunter.4567 using gmail.com> escribió:
>>>>
>>>>> x <- factor(0:1)
>>>>> x <- factor("yes","no")
>>>>>
>>>>> will produce identical results up to labeling.
>>>>>
>>>>>
>>>>> Bert Gunter
>>>>>
>>>>> "The trouble with having an open mind is that people keep coming along
>>>> and
>>>>> sticking things into it."
>>>>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>>>>
>>>>>
>>>>> On Sat, Aug 1, 2020 at 10:40 AM Paul Bernal <paulbernal07 using gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Dear friends,
>>>>>>
>>>>>> Hope you are doing great. I want to fit a logistic regression in R,
>>>> where
>>>>>> the dependent variable is the covid status (I used 1 for covid
>>>> positives,
>>>>>> and 0 for covid negatives), but when I ran the glm, R complains that I
>>>>>> should make the dependent variable a factor.
>>>>>>
>>>>>> What would be more advisable, to keep the dependent variable with 1s
>>>> and
>>>>>> 0s, or code it as yes/no and then make it a factor?
>>>>>>
>>>>>> Any guidance will be greatly appreciated,
>>>>>>
>>>>>> Best regards,
>>>>>>
>>>>>> Paul
>>>>>>
>>>>>> [[alternative HTML version deleted]]
>>>>>>
>>>>>> ______________________________________________
>>>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>> PLEASE do read the posting guide
>>>>>> http://www.R-project.org/posting-guide.html
>>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>>
>>>>>
>>>>
>>>> [[alternative HTML version deleted]]
>>>>
>>>> ______________________________________________
>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>
>>>
>>> --
>>> Patrick S. Malone, Ph.D., Malone Quantitative
>>> NEW Service Models: http://malonequantitative.com
>>>
>>> He/Him/His
>>>
>>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list