[R] categorical variables

Uwe Ligges ligges at statistik.tu-dortmund.de
Mon Dec 12 21:38:34 CET 2011



On 12.12.2011 19:36, Brian Jensvold wrote:
> I am doing a logistic regression, and by accident I included a field
> which has the 2digit abbreviation for all 50 states labeled "st".  I was
> surprised to see that the glm did not come up with an error message but
> instead appears to have automatically broken down this field into
> individual fields (stAK and stAL).  Does R really know to turn all
> categorical variables in binary dummy variables?

Yes.

> I have tried answering
> the question on my own and have found:
>
>
>
> When including categorical variables in a regression, the default in R
> is to
>
> set the first level as the base.  Is there an option to specify a
> different
>
> level as the base?

Well, reorder to levels of the factor and use the most appropriate base 
level as the first one. This simplifies life since it is from now on the 
base level for all the models you try to fit.


> My next/same question is what does it mean to "set the first level as
> the base" does this mean it turns each value into a unique binary
> result?

What is a "unique binary result"?

Actually, the base level is inlcuded in the intercept of your model and 
you see the differences for the other levels.

Uwe Ligges



>
> ********************************
>      CONFIDENTIALITY NOTICE
> ********************************
>
> This message (including any attachments) is intended onl...{{dropped:21}}
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list