[R] converting factors to dummy variables

Henrique Dallazuanna wwwhsd at gmail.com
Wed Dec 5 11:34:18 CET 2007


Try this also:

table(cbind.data.frame(Price=my.dataset$Price,
Colour=paste(my.dataset$Colour, my.dataset$Store, sep=":")))

On 05/12/2007, Tim Calkins <tim.calkins at gmail.com> wrote:
> Hi all -
>
> I'm trying to find a way to create dummy variables from factors in a
> regression.  I have been using biglm along the lines of
>
> ff <- log(Price) ~ factor(Colour):factor(Store) +
> factor(DummyVar):factor(Colour):factor(Store)
>
> lm1 <- biglm(ff, data=my.dataset)
>
> but because there are lots of colours (>100) and lots of stores
> (>250), I run it to memory problems.  Now, not every store sells every
> colour and so it should be possible to create the matrix of factor
> variables myself and greatly reduce the size of the problem.  it seems
> that lm / biglm use all combinations of factor levels when used in
> factor(Colour):factor(Store) so by creating my own matrix of factor
> variables i should be able to reduce the size of the problem
> considerably.
>
> If i have a data frame
> >my.dataset <- data.frame(Price=1:12, Colour= c('red','blue','green'),
> Store=c('a', 'b', 'c', 'a', 'c', 'd', 'e', 'e', 'e', 'e', 'b', 'e'),
> DummyVar = sort(rep(c(0,1),6)) )
>
> i want to create a data frame with the dummy vars that looks like
>
> red:a   red:e   blue:b  blue:c  blue:e  green:c green:d green:e
> 1       0       0       0       0       0       0       0
> 0       0       1       0       0       0       0       0
> 0       0       0       0       0       1       0       0
> 1       0       0       0       0       0       0       0
> 0       0       0       1       0       0       0       0
> 0       0       0       0       0       0       1       0
> 0       1       0       0       0       0       0       0
> 0       0       0       0       1       0       0       0
> 0       0       0       0       0       0       0       1
> 0       1       0       0       0       0       0       0
> 0       0       1       0       0       0       0       0
> 0       0       0       0       0       0       0       1
>
> any ideas would be appreciated.
>
>
> --
> Tim Calkins
> 0406 753 997
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40" S 49° 16' 22" O



More information about the R-help mailing list