[R] [FORGED] Response Variable Coding
Rolf Turner
r.turner at auckland.ac.nz
Thu Aug 20 06:11:07 CEST 2015
On 20/08/15 09:43, Abraham Mathew wrote:
> Very simple question that I want confirm.
>
> Let's say that I have a response variable. What are the appropriate ways
> that it can be coded for a logistic regression model?
>
> 1. It can be 0/1 and a factor
> 2. It can be 1/2 and a factor
> 3. It can be characters and a factor, where the second letter takes on the
> 1. (bad/good becomes 0/1).
> 4. ?
> 5. ?
>
>
> My question is....are 1, 2, and 3 all correct, and are there other coding
> schemes that glm can take.
When in doubt, RTFM! :-)
From ?binomial:
> For the binomial and quasibinomial families the response can be
> specified in one of three ways:
>
> As a factor: ‘success’ is interpreted as the factor not having the first
> level (and hence usually of having the second level).
>
> As a numerical vector with values between 0 and 1, interpreted as the
> proportion of successful cases (with the total number of cases given by
> the weights).
>
> As a two-column integer matrix: the first column gives the number of
> successes and the second the number of failures.
That pretty well says it all. One thing to note: If the response is a
*numeric* vector of 0's and 1's it will produce the same result as it
would if it were converted to a factor. (This is because the default
weights are all 1.)
HTH
cheers,
Rolf Turner
--
Technical Editor ANZJS
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276
More information about the R-help
mailing list