[R] coded to categorical variables in a large dataset

Gabor Grothendieck ggrothendieck at gmail.com
Fri Dec 29 20:40:47 CET 2006


As Richard has already pointed out you may only need to convert your
numeric vector to a factor but just in case here are a few direct answers:


Using X from Chuck's post here are two ways of creating a 100x5
matrix of indicator variables:

model.matrix(~ X-1, list(X = factor(X)))
outer(X, 1:5, "==")+0

# To create eventi variables
# here is a way of creating them

event1 <- (X == 1) + 0 # and similarly for 2, 3, 4, 5

# or do it in a loop
for(i in 1:5) assign(paste("event", i, sep = ""), (X == i) + 0)

# or create as columns of a data frame
f <- function(i, j) (X == j) + 0
as.data.frame(mapply(f, paste("event", 1:5, sep = ""), 1:5))



On 12/29/06, sj <ssj1364 at gmail.com> wrote:
> I am working with a dataset where there are 5 possible outcomes (coded 1:5),
> I would like to create 5 categorical variables (event1...event5). I am using
> a for loop an if statements, but I have a large dataset( approx 100,000
> rows) it takes quite a bit of time, is there a way to speed this up? Here is
> some sample code of what I am currently doing.
>
> test2 <-rep(seq(1:5),2000)
>
> event1 <- rep(0,nrow(test2))
> event2 <- rep(0,nrow(test2))
> event3 <- rep(0,nrow(test2))
> event4 <- rep(0,nrow(test2))
> event5 <- rep(0,nrow(test2))
>
> for(i in 1:length(event1))
> {
>    if (test2[i]==1)
>    {
>        event1[i]=1
>    }
>
>    if (test2[i]==2)
>    {
>        event2[i]=1
>    }
>
>    if (test2[i]==3)
>    {
>        event3[i]=1
>    }
>
>    if (test2[i]==4)
>    {
>        event4[i]=1
>    }
>
>    if (test2[i]==5)
>    {
>        event5[i]=1
>    }
> }
>
>
>
> thanks,
>
> Spencer
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list