[R] What does lm() output coefficient mean when it's been given a categorical predictor of string values?
peter dalgaard
pdalgd at gmail.com
Tue Oct 4 18:45:10 CEST 2016
> On 04 Oct 2016, at 17:39 , mviljamaa <mviljamaa at kapsi.fi> wrote:
>
> I'm using lm() for a model that has a predictor that has two values {poika, tyttö} (boy and girl in Finnish).
>
> I make a model with this categorical variable:
>
> fit1 <- lm(dta$X.U.FEFF..mpist. ~ dta$sukup + dta$HISEI + dta$SES)
>
> and while the variable/vector is here named as dta$sukup, what lm() returns is a coefficient
>
> dta$sukuptyttö
> -6.19756
>
> What does the added 'tyttö' in the variable mean? Does it mean that 'tyttö' has been interpreted as 1 and 'poika' as 0?
Short answer: Yes.
Long answer: Yes, if treatment contrast parametrization is being used.
See help(contrasts) for a lead-in to an even longer answer.
--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
More information about the R-help
mailing list