[R] Competing with SPSS and SAS: improving code that loops through rows (data manipulation)

Jim Price price_ja at hotmail.com
Sat Mar 27 00:28:22 CET 2010


Here's my first stab. It removes some of the typical redundencies in your
code (loops, building data frames by adding one column at a time) and
instead does what is probably more canonical R style (although I'm willing
to be corrected, as I suspect my code is a little suspect at times). 

For this example, I got a 10-fold speed-up, although I suspect this code
will scale a lot better - primarily because I'm not continually expanding
the data frames one column at a time, but instead working each part out
separately and then sticking them together at the end. The key commands used
(for when you look through the help files) are lapply, do.call, by and
Reduce.

If you use this scaled up you'd need to play with some of the indices in
places, but I'm sure that's all pretty obvious.

Oh, and because this is the usual (and good!) advice - don't call your data
'data':

library(fortunes)
fortune('dog')



# This was your base set-up code
set.seed(123) 
data<-data.frame(group=c(rep("first",10),rep("second",10)),week=c(1:10,1:10),a=abs(round(rnorm(20)*10,0)),
b=abs(round(rnorm(20)*100,0))) 
data 


# Set up the ratio variables
system.time({
temp <- cbind(data, do.call(cbind, lapply(names(data)[3:4], function(.x)
	{
		unlist(by(data, data$group, function(.y) .y[,.x] / max(.y[,.x])))
	})))
colnames(temp)[5:6] <- paste(colnames(data)[3:4], 'ind.to.max', sep = '.')
})





system.time({
constants <- expand.grid(vars = colnames(temp)[5:6], c1 = 1:3, c2 =
seq(0.15, 0.45, 0.15))


results <- lapply(seq(nrow(constants)), function(.x)
	{
		dat <- temp[, as.character(constants[.x, 1])]
		d <- exp(1) ^ log(0.5) / constants[.x, 2]
		l <- -10 * log(1 - constants[.x, 3])

		unlist(by(dat, temp$group, function(.y) 
			Reduce(function(.u, .v) 1 - ((1 - .u * d) / (exp(1) ^ (.v * l))), .y,
accumulate = T, init = 0)[-1]))
	})

final <- cbind(temp, do.call(cbind, results))
colnames(final)[-(1:6)] <- paste(substr(constants$vars, 1, 1), constants$c1,
100*constants$c2, '..transf', sep = '.')
})





Jim Price.
Cardiome Pharma Corp.


-- 
View this message in context: http://n4.nabble.com/Competing-with-SPSS-and-SAS-improving-code-that-loops-through-rows-data-manipulation-tp1692848p1692967.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list