[R-sig-eco] [R-sig-phylo] GEE and AIC
Emmanuel Paradis
Emmanuel.Paradis at mpl.ird.fr
Mon Jun 2 15:34:58 CEST 2008
Le 30.05.2008 02:18, tgarland at ucr.edu a écrit :
> This is not a problem for contrasts of GLS-type methods. It can be a
> pain in the a** to code a whole bunch of categorical variables as dummy
> variables and then compute contrasts (depending on your software), but
> it is not a problem from the perspective of the math/stats.
You're right to add "depending on your software" because it's very easy
with R (saving the user's a**), but few people seem to know it. The
function model.matrix is the tool here. Suppose you have a factor with
four levels and two repetitions for each level:
R> (x <- gl(4, 2))
[1] 1 1 2 2 3 3 4 4
Levels: 1 2 3 4
Then you just specify to model.matrix that x is a predictor in the usual
R formula notation:
R> model.matrix(~ x)
(Intercept) x2 x3 x4
1 1 0 0 0
2 1 0 0 0
3 1 1 0 0
4 1 1 0 0
5 1 0 1 0
6 1 0 1 0
7 1 0 0 1
8 1 0 0 1
attr(,"assign")
[1] 0 1 1 1
attr(,"contrasts")
attr(,"contrasts")$x
[1] "contr.treatment"
This is actually used to prepare the "model matrix" in a regression,
hence the column of 1's added for the intercept. If you just want the
dummy variables:
R> model.matrix(~ x)[, -1]
x2 x3 x4
1 0 0 0
2 0 0 0
3 1 0 0
4 1 0 0
5 0 1 0
6 0 1 0
7 0 0 1
8 0 0 1
If you've got several such variables you can do model.matrix( ~ x+y+z),
possibly adding interactions too. If they are in a data.frame, then use
the option 'data'.
If you don't like the current coding, you can change it in the options
(see ?C):
R> options(contrasts = c("contr.sum", "contr.poly"))
R> model.matrix(~ x)[, -1]
x1 x2 x3
1 1 0 0
2 1 0 0
3 0 1 0
4 0 1 0
5 0 0 1
6 0 0 1
7 -1 -1 -1
8 -1 -1 -1
model.matrix() is called internally by most regression functions (lm,
glm, gee, gls, lme, ...) so the user doesn't have to bother about it,
but it's nice to see how it works.
--
Emmanuel Paradis
IRD, Jakarta, Indonesia
http://ape.mpl.ird.fr/
More information about the R-sig-ecology
mailing list