[R] coded to categorical variables in a large dataset

Charles C. Berry cberry at tajo.ucsd.edu
Sat Dec 30 00:25:55 CET 2006


On Fri, 29 Dec 2006, sj wrote:

> I am working with a dataset where there are 5 possible outcomes (coded 1:5),
> I would like to create 5 categorical variables (event1...event5). I am using
> a for loop an if statements, but I have a large dataset( approx 100,000
> rows) it takes quite a bit of time, is there a way to speed this up? Here is
> some sample code of what I am currently doing.
>
> test2 <-rep(seq(1:5),2000)
>
[...]

As Richard suggested you may not want to do this at all, but ...

If you want these as a matrix, this is fast and direct:

 	mat <- diag(5)[ test2, ]

If not as a matrix

 	event1 <- as.numeric( test2 == 1 )

is concise and

 	for (i in 1:5) assign(paste("event",i,sep=""), as.numeric( test2==i ))

is about as fast as you can get.

HTH,

Chuck


Charles C. Berry                        (858) 534-2098
                                          Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu	         UC San Diego
http://biostat.ucsd.edu/~cberry/         La Jolla, San Diego 92093-0717



More information about the R-help mailing list