[R] question on model.matrix

Mon Jan 30 17:54:31 CET 2012

Greetings

On Sat, Jan 28, 2012 at 2:43 PM, Daniel Negusse
<daniel.negusse at my.mcphs.edu> wrote:
>>
>>

>> while reading some tutorials, i came across this and i am stuck. i want to understand it and would appreciate if anyone can tell me.
>>
>> design <- model.matrix(~ -1+factor(c(1,1,2,2,3,3)))
>>
>> can someone break down this code and explain to me what the "~", and the "-1+factor" are doing?

A formula would be y ~ x, so when you don't include y, it means you
only want the right hand side variables.  The term design matrix
generally means the numeric coding that is fitted in a statistical
procedure.

The -1 in the formula means "do not insert an intercept for me."  It
affects the way the factor variable is converted to numeric contrasts
in the design matrix.   If there is an intercept, then the contrasts
have to be adjusted to prevent perfect multicollinearity.

If you run a few examples, you will see. This uses lm, but the formula
and design matrix ideas are same. Note, with an intercept, I get 3
dummy variables from x2, but with no intercept, I get 4 dummies:

> x1 <- rnorm(16)
> x2 <- gl(4, 4, labels=c("none","some","more","lots"))
> y <- rnorm(16)
> m1 <- lm(y ~ x1 + x2)
> model.matrix(m1)
   (Intercept)          x1 x2some x2more x2lots
1            1 -0.25600007      0      0      0
2            1  0.94963659      0      0      0
3            1  0.06915561      0      0      0
4            1  0.89971204      0      0      0
5            1  0.73817482      1      0      0
6            1  2.92451195      1      0      0
7            1 -0.80682449      1      0      0
8            1  1.07472998      1      0      0
9            1  1.34949123      0      1      0
10           1 -0.42203984      0      1      0
11           1 -1.66316740      0      1      0
12           1 -2.83232063      0      1      0
13           1  1.26177313      0      0      1
14           1  0.10359857      0      0      1
15           1 -1.85671242      0      0      1
16           1 -0.25140729      0      0      1
attr(,"assign")
[1] 0 1 2 2 2
attr(,"contrasts")
attr(,"contrasts")$x2
[1] "contr.treatment"

> m2 <- lm(y ~ -1 + x1 + x2)
> model.matrix(m2)
            x1 x2none x2some x2more x2lots
1  -0.25600007      1      0      0      0
2   0.94963659      1      0      0      0
3   0.06915561      1      0      0      0
4   0.89971204      1      0      0      0
5   0.73817482      0      1      0      0
6   2.92451195      0      1      0      0
7  -0.80682449      0      1      0      0
8   1.07472998      0      1      0      0
9   1.34949123      0      0      1      0
10 -0.42203984      0      0      1      0
11 -1.66316740      0      0      1      0
12 -2.83232063      0      0      1      0
13  1.26177313      0      0      0      1
14  0.10359857      0      0      0      1
15 -1.85671242      0      0      0      1
16 -0.25140729      0      0      0      1
attr(,"assign")
[1] 1 2 2 2 2
attr(,"contrasts")
attr(,"contrasts")$x2
[1] "contr.treatment"

I think you'll need to mess about with R basics like plot and lm
before you go off using the formulas that you really care about.
Otherwise, well, you'll always be lost about stuff like "~" and "-1".

I've started posting all my lecture notes (source code, R code, pdf
output) http://pj.freefaculty.org/guides.  That might be a quick start
for you.

-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas