[R] Yet another set of codes to optimize
Daren Tan
daren76 at hotmail.com
Fri Dec 5 03:41:23 CET 2008
I have problems converting my dataset from long to wide format. Previous attempts using reshape package and aggregate function were unsuccessful as they took too long. Apparently, my simplified solution also lasted as long.
My complete codes is given below. When sample.size = 10000, the execution takes about 20 seconds. But sample.size = 100000 seems to take eternity. My actual sample.size is 15000000 i.e. 15 million.
sample.size <- 10000
m <- data.frame(Name=sample(1:100000, sample.size, T), Type=sample(1:1000, sample.size, T), Predictor=sample(LETTERS[1:10], sample.size, T))
res <- function(m) {
m.12.unique <- unique(m[,1:2])
m.12.unique <- m.12.unique[order(m.12.unique[,1], m.12.unique[,2]),]
v1 <- paste(m.12.unique[,1], m.12.unique[,2], sep=".")
v2 <- c(sort(unique(m[,3])))
res <- matrix(0, nr=length(v1), nc=length(v2), dimnames=list(v1, v2))
m.ids <- paste(m[,1], m[,2], sep=".")
for(i in 1:nrow(m)) {
x <- m.ids[i]
y <- m[i,3]
res[x, y] <- res[x, y] + 1
}
res <- data.frame(m.12.unique[,1], m.12.unique[,2], res, row.names=NULL)
colnames(res) <- c("Name", "Type", v2)
return(res)
}
res(m)
> sessionInfo()
R version 2.8.0 (2008-10-20)
i386-pc-mingw32
locale:
LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
More information about the R-help
mailing list