[R] What does lm() output coefficient mean when it's been given a categorical predictor of string values?

peter dalgaard pdalgd at gmail.com
Tue Oct 4 18:45:10 CEST 2016

> On 04 Oct 2016, at 17:39 , mviljamaa <mviljamaa at kapsi.fi> wrote:
> I'm using lm() for a model that has a predictor that has two values {poika, tyttö} (boy and girl in Finnish).
> I make a model with this categorical variable:
> fit1 <- lm(dta$X.U.FEFF..mpist. ~ dta$sukup + dta$HISEI + dta$SES)
> and while the variable/vector is here named as dta$sukup, what lm() returns is a coefficient
> dta$sukuptyttö
>   -6.19756
> What does the added 'tyttö' in the variable mean? Does it mean that 'tyttö' has been interpreted as 1 and 'poika' as 0?

Short answer: Yes.

Long answer: Yes, if treatment contrast parametrization is being used.

See help(contrasts) for a lead-in to an even longer answer.

Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com

More information about the R-help mailing list