[R] What does lm() output coefficient mean when it's been given a categorical predictor of string values?

Michael Dewey lists at dewey.myzen.co.uk
Wed Oct 5 11:09:22 CEST 2016


See inline

On 04/10/2016 16:39, mviljamaa wrote:
> I'm using lm() for a model that has a predictor that has two values
> {poika, tyttö} (boy and girl in Finnish).
>
> I make a model with this categorical variable:
>
> fit1 <- lm(dta$X.U.FEFF..mpist. ~ dta$sukup + dta$HISEI + dta$SES)
>

You will find your code easier to read if you go

  fit1 <- lm(X.U.FEFF..mpist. ~ sukup + HISEI + SES, data = dta)


> and while the variable/vector is here named as dta$sukup, what lm()
> returns is a coefficient
>
> dta$sukuptyttö
>      -6.19756
>
> What does the added 'tyttö' in the variable mean? Does it mean that
> 'tyttö' has been interpreted as 1 and 'poika' as 0?


If you would like it the other way round then see ?relevel
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Michael
http://www.dewey.myzen.co.uk/home.html



More information about the R-help mailing list