[R] Dependent Variable in Logistic Regression

John Fox j|ox @end|ng |rom mcm@@ter@c@
Sat Aug 1 21:01:29 CEST 2020


Dear Paul,

I think that this thread has gotten unnecessarily complicated. The 
answer, as is easily demonstrated, is that a binary response for a 
binomial GLM in glm() may be a factor, a numeric variable, or a logical 
variable, with identical results; for example:

--------------- snip -------------

 > set.seed(123)

 > head(x <- rnorm(100))
[1] -0.56047565 -0.23017749  1.55870831  0.07050839  0.12928774  1.71506499

 > head(y <- rbinom(100, 1, 1/(1 + exp(-x))))
[1] 0 1 1 1 1 0

 > head(yf <- as.factor(y))
[1] 0 1 1 1 1 0
Levels: 0 1

 > head(yl <- y == 1)
[1] FALSE  TRUE  TRUE  TRUE  TRUE FALSE

 > glm(y ~ x, family=binomial)

Call:  glm(formula = y ~ x, family = binomial)

Coefficients:
(Intercept)            x
      0.3995       1.1670

Degrees of Freedom: 99 Total (i.e. Null);  98 Residual
Null Deviance:	    134.6
Residual Deviance: 114.9 	AIC: 118.9

 > glm(yf ~ x, family=binomial)

Call:  glm(formula = yf ~ x, family = binomial)

Coefficients:
(Intercept)            x
      0.3995       1.1670

Degrees of Freedom: 99 Total (i.e. Null);  98 Residual
Null Deviance:	    134.6
Residual Deviance: 114.9 	AIC: 118.9

 > glm(yl ~ x, family=binomial)

Call:  glm(formula = yl ~ x, family = binomial)

Coefficients:
(Intercept)            x
      0.3995       1.1670

Degrees of Freedom: 99 Total (i.e. Null);  98 Residual
Null Deviance:	    134.6
Residual Deviance: 114.9 	AIC: 118.9

--------------- snip -------------

The original poster claimed to have encountered an error with a 0/1 
numeric response, but didn't show any data or even a command. I suspect 
that the response was a character variable, but of course can't really 
know that.

Best,
  John

John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://socialsciences.mcmaster.ca/jfox/

On 2020-08-01 2:25 p.m., Paul Bernal wrote:
> Dear friend,
> 
> I am aware that I have a binomial dependent variable, which is covid status
> (1 if covid positive, and 0 otherwise).
> 
> My question was if R requires to turn a binomial response variable into a
> factor or not, that's all.
> 
> Cheers,
> 
> Paul
> 
> El sáb., 1 de agosto de 2020 1:22 p. m., Bert Gunter <bgunter.4567 using gmail.com>
> escribió:
> 
>> ... yes, but so does lm() for a categorical **INdependent** variable with
>> more than 2 numerically labeled levels. n levels  = (n-1) df for a
>> categorical covariate, but 1 for a continuous one (unless more complex
>> models are explicitly specified of course). As I said, the OP seems
>> confused about whether he is referring to the response or covariates. Or
>> maybe he just made the same typo I did.
>>
>> Bert Gunter
>>
>> "The trouble with having an open mind is that people keep coming along and
>> sticking things into it."
>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>
>>
>> On Sat, Aug 1, 2020 at 11:15 AM Patrick (Malone Quantitative) <
>> malone using malonequantitative.com> wrote:
>>
>>> No, R does not. glm() does in order to do logistic regression.
>>>
>>> On Sat, Aug 1, 2020 at 2:11 PM Paul Bernal <paulbernal07 using gmail.com>
>>> wrote:
>>>
>>>> Hi Bert,
>>>>
>>>> Thank you for the kind reply.
>>>>
>>>> But what if I don't turn the variable into a factor. Let's say that in
>>>> excel I just coded the variable as 1s and 0s and just imported the
>>>> dataset
>>>> into R and fitted the logistic regression without turning any categorical
>>>> variable or dummy variable into a factor?
>>>>
>>>> Does R requires every dummy variable to be treated as a factor?
>>>>
>>>> Best regards,
>>>>
>>>> Paul
>>>>
>>>> El sáb., 1 de agosto de 2020 12:59 p. m., Bert Gunter <
>>>> bgunter.4567 using gmail.com> escribió:
>>>>
>>>>> x <- factor(0:1)
>>>>> x <- factor("yes","no")
>>>>>
>>>>> will produce identical results up to labeling.
>>>>>
>>>>>
>>>>> Bert Gunter
>>>>>
>>>>> "The trouble with having an open mind is that people keep coming along
>>>> and
>>>>> sticking things into it."
>>>>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>>>>
>>>>>
>>>>> On Sat, Aug 1, 2020 at 10:40 AM Paul Bernal <paulbernal07 using gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Dear friends,
>>>>>>
>>>>>> Hope you are doing great. I want to fit a logistic regression in R,
>>>> where
>>>>>> the dependent variable is the covid status (I used 1 for covid
>>>> positives,
>>>>>> and 0 for covid negatives), but when I ran the glm, R complains that I
>>>>>> should make the dependent variable a factor.
>>>>>>
>>>>>> What would be more advisable, to keep the dependent variable with 1s
>>>> and
>>>>>> 0s, or code it as yes/no and then make it a factor?
>>>>>>
>>>>>> Any guidance will be greatly appreciated,
>>>>>>
>>>>>> Best regards,
>>>>>>
>>>>>> Paul
>>>>>>
>>>>>>          [[alternative HTML version deleted]]
>>>>>>
>>>>>> ______________________________________________
>>>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>> PLEASE do read the posting guide
>>>>>> http://www.R-project.org/posting-guide.html
>>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>>
>>>>>
>>>>
>>>>          [[alternative HTML version deleted]]
>>>>
>>>> ______________________________________________
>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>
>>>
>>> --
>>> Patrick S. Malone, Ph.D., Malone Quantitative
>>> NEW Service Models: http://malonequantitative.com
>>>
>>> He/Him/His
>>>
>>
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list