[R] converting factors to dummy variables

Tim Calkins tim.calkins at gmail.com
Wed Dec 5 04:39:45 CET 2007


Hi all -

I'm trying to find a way to create dummy variables from factors in a
regression.  I have been using biglm along the lines of

ff <- log(Price) ~ factor(Colour):factor(Store) +
factor(DummyVar):factor(Colour):factor(Store)

lm1 <- biglm(ff, data=my.dataset)

but because there are lots of colours (>100) and lots of stores
(>250), I run it to memory problems.  Now, not every store sells every
colour and so it should be possible to create the matrix of factor
variables myself and greatly reduce the size of the problem.  it seems
that lm / biglm use all combinations of factor levels when used in
factor(Colour):factor(Store) so by creating my own matrix of factor
variables i should be able to reduce the size of the problem
considerably.

If i have a data frame
>my.dataset <- data.frame(Price=1:12, Colour= c('red','blue','green'),
Store=c('a', 'b', 'c', 'a', 'c', 'd', 'e', 'e', 'e', 'e', 'b', 'e'),
DummyVar = sort(rep(c(0,1),6)) )

i want to create a data frame with the dummy vars that looks like

red:a	red:e	blue:b	blue:c	blue:e	green:c	green:d	green:e
1	0	0	0	0	0	0	0
0	0	1	0	0	0	0	0
0	0	0	0	0	1	0	0
1	0	0	0	0	0	0	0
0	0	0	1	0	0	0	0
0	0	0	0	0	0	1	0
0	1	0	0	0	0	0	0
0	0	0	0	1	0	0	0
0	0	0	0	0	0	0	1
0	1	0	0	0	0	0	0
0	0	1	0	0	0	0	0
0	0	0	0	0	0	0	1

any ideas would be appreciated.


-- 
Tim Calkins
0406 753 997



More information about the R-help mailing list