[R] Centering multi-level unordered factors
David Winsemius
dwinsemius at comcast.net
Tue Oct 8 03:53:40 CEST 2013
On Oct 7, 2013, at 4:52 PM, Robert Lynch wrote:
> I have a question I am not even sure quite how to ask.
>
> When r fits models with un-ordered categorical variables as predictors
> (RHS of model) it automatically converts them into 1 less dichotomous
> variables than there are levels.
>
> For example if I had levels(trait) = ("A","B","C") it would automatically
> recode to
> NewVar1 NewVar2
> A 0 0
> B 1 0
> C 0 1
>
> What I would like to know is, is there a way that I can "center" these
> categorical variables, and if so how
>
> for continuous variables it is simple
> x <- x-mean(x)
You can choose different contrasts. Take a look at contr.sum()
> trait <- factor(1:3, labels = c("A","B","C"))
> contrasts(trait) <- contr.sum(3)
> model.matrix( ~trait )
(Intercept) trait1 trait2
1 1 1 0
2 1 0 1
3 1 -1 -1
attr(,"assign")
[1] 0 1 1
attr(,"contrasts")
attr(,"contrasts")$trait
[,1] [,2]
A 1 0
B 0 1
C -1 -1
--
David.
>
> for a single dichotomous variable it is not so hard
> gender <- gender - sum(gender)/length(gender)
> where the gender are (0,1) or (-.5,.5) for example
> which would give gender coefficients in a model that would still reflect
> the difference between the two genders but the intercept and the other
> coefficients would be for some one of "average gender"
>
> and it is that last part that I am unclear on for a multi (3 or more) level
> factor. How do you set up variables so that the *other* coefficients
> reflect the average across the factor levels. Do I need two or three
> centered variables? and is there a quick way to get at all those variables
> if my factor has many levels, e.g. 14?
>
>
> Robert
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius
Alameda, CA, USA
More information about the R-help
mailing list