[R] How should I improve the following R code?
Seung Jun
seungwjun at gmail.com
Tue Jan 8 00:49:33 CET 2008
I'm looking for a way to improve code that's proven to be inefficient.
Suppose that a data source generates the following table every minute:
Index Count
------------
0 234
1 120
7 11
30 1
I save the tables in the following CSV format:
time,index,count
0,0:1:7:30,234:120:11:1
1,0:2:3:19,199:110:87:9
That is, each line represents a table, and I have N lines for N minutes of
data collection.
Now, I wrote the following code to get quantiles for each time period:
library(Hmisc)
stbl <- read.csv("data.csv")
index <- lapply(strsplit(stbl$index, ":", fixed = TRUE), as.numeric)
count <- lapply(strsplit(stbl$count, ":", fixed = TRUE), as.numeric)
len <- length(index)
for (i in 1:len) {
v <- wtd.quantile(index[[i]], count[[i]], c(0, 0.2, 0.5, 0.8, 1))
stbl$q0[i] <- v[1]
stbl$q2[i] <- v[2]
stbl$q5[i] <- v[3]
stbl$q8[i] <- v[4]
stbl$q10[i] <- v[5]
}
It works fine for a small N, but it get quickly inefficient as N grows. The
for-loop takes too long. How could I improve the code or data
representation so it can run fast?
Thanks,
Seung
More information about the R-help
mailing list