[R] converting factors to dummy variables
Charles C. Berry
cberry at tajo.ucsd.edu
Wed Dec 5 05:26:39 CET 2007
On Wed, 5 Dec 2007, Tim Calkins wrote:
> Hi all -
>
> I'm trying to find a way to create dummy variables from factors in a
> regression. I have been using biglm along the lines of
>
> ff <- log(Price) ~ factor(Colour):factor(Store) +
> factor(DummyVar):factor(Colour):factor(Store)
>
> lm1 <- biglm(ff, data=my.dataset)
>
> but because there are lots of colours (>100) and lots of stores
> (>250), I run it to memory problems. Now, not every store sells every
> colour and so it should be possible to create the matrix of factor
> variables myself and greatly reduce the size of the problem. it seems
> that lm / biglm use all combinations of factor levels when used in
> factor(Colour):factor(Store) so by creating my own matrix of factor
> variables i should be able to reduce the size of the problem
> considerably.
>
> If i have a data frame
>> my.dataset <- data.frame(Price=1:12, Colour= c('red','blue','green'),
> Store=c('a', 'b', 'c', 'a', 'c', 'd', 'e', 'e', 'e', 'e', 'b', 'e'),
> DummyVar = sort(rep(c(0,1),6)) )
>
> i want to create a data frame with the dummy vars that looks like
>
> red:a red:e blue:b blue:c blue:e green:c green:d green:e
> 1 0 0 0 0 0 0 0
> 0 0 1 0 0 0 0 0
> 0 0 0 0 0 1 0 0
> 1 0 0 0 0 0 0 0
> 0 0 0 1 0 0 0 0
> 0 0 0 0 0 0 1 0
> 0 1 0 0 0 0 0 0
> 0 0 0 0 1 0 0 0
> 0 0 0 0 0 0 0 1
> 0 1 0 0 0 0 0 0
> 0 0 1 0 0 0 0 0
> 0 0 0 0 0 0 0 1
>
> any ideas would be appreciated.
Use
mat <- model.matrix( ~ClrStr-1,
transform( my.dataset, ClrStr =
factor( paste(Colour,Store,sep=":") ) ) )
then pretty up the colnames() and re-order columns if order matters.
----
However, if DummyVar is a categorical variable, you could just compute
means on the appropriate subsets by maintaining a table of sums and
totals. Then in a second pass through the data get the residual sums of
squares. If the data are already in a database, it might make sense to do
these operations there and import the results to R for further massaging.
HTH,
Chuck
>
>
> --
> Tim Calkins
> 0406 753 997
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
Charles C. Berry (858) 534-2098
Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901
More information about the R-help
mailing list