[R] Making routine faster by using apply instead of for-loop
William Dunlap
wdunlap at tibco.com
Tue Jan 12 21:31:20 CET 2010
> -----Original Message-----
> From: r-help-bounces at r-project.org
> [mailto:r-help-bounces at r-project.org] On Behalf Of Etienne Stockhausen
> Sent: Tuesday, January 12, 2010 10:59 AM
> To: r-help at r-project.org
> Subject: [R] Making routine faster by using apply instead of for-loop
>
> Hey everybody,
>
> I have a small problem with a routine, which prepares some data for
> plotting.
> I've made a small example:
>
> c=10
> mat=data.frame(matrix(1:(c*c),c,c))
> row.names(mat)=seq(c,1,length=c)
> names(mat)=c(seq(2,c,length=c/2),seq(c,2,length=c/2))
> v=as.numeric(row.names(mat))
> w=as.numeric(names(mat))
> for(i in 1:c)
> { for(j in 1:c)
> {
> if(v[j]+w[i]<=c)(mat[i,j]=NA)
> }}
>
> This produces exactly the data I need to go on, but if I increase the
> constant c ,to for instance 500 , it takes a very long time
> to set the NA's.
The first problem is that random (element-by-element)
access to a data.frame is much slower than the equivalent
access to a matrix. Rewriting your code a bit to
use a matrix speeds up the c=500 case by a factor of 750.
f0 <- function (c = 10) {
mat = matrix(1:(c * c), c, c)
rownames(mat) = seq(c, 1, length = c)
colnames(mat) = c(seq(2, c, length = c/2), seq(c, 2, length = c/2))
v = as.numeric(rownames(mat))
w = as.numeric(colnames(mat))
for (i in 1:c) {
for (j in 1:c) {
if (v[j] + w[i] <= c) {
mat[i, j] = NA
}
}
}
mat
}
Rewriting that to insert the NA's one operation speeds it up by
another factor of 10 (in the c=500 case)
f1 <- function (c = 10) {
v <- seq(c, 1, length = c)
w <- c(seq(2, c, length = c/2), seq(c, 2, length = c/2))
mat <- matrix(1:(c * c), nrow = c, ncol = c, dimnames = list(v,
w))
mat[outer(w, v, `+`) <= c] <- NA
mat
}
If you really want a matrix, pass the output of these functions
into data.frame (with check.names=FALSE since the column
names are not considered legal on data.frame: the contain
duplicates and look numeric).
By the way, it is generally a bad idea to use apply() on
a data.frame. It is meant for matrices.
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
> I've heard there is a much faster way to set the NA's using
> the command
> apply( ), but I don't know how.
> I'm looking forward for any ideas or hints, that might help me.
>
> Best regards
>
> Etienne
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list