[Rd] extracting rows from a data frame by looping over the row names: performance issues
Herve Pages
hpages at fhcrc.org
Sat Mar 3 03:03:57 CET 2007
Hi Greg,
Greg Snow wrote:
> Your 2 examples have 2 differences and they are therefore confounded in
> their effects.
>
> What are your results for:
>
> system.time(for (i in 1:100) {row <- dat[i, ] })
>
>
>
Right. What you suggest is even faster (and more simple):
> mat <- matrix(rep(paste(letters, collapse=""), 5*300000), ncol=5)
> dat <- as.data.frame(mat)
> system.time(for (key in row.names(dat)[1:100]) { row <- dat[key, ] })
user system elapsed
13.241 0.460 13.702
> system.time(for (i in 1:100) { row <- sapply(dat, function(col) col[i]) })
user system elapsed
0.280 0.372 0.650
> system.time(for (i in 1:100) {row <- dat[i, ] })
user system elapsed
0.044 0.088 0.130
So apparently here extracting with dat[i, ] is 300 times faster than
extracting with dat[key, ] !
> system.time(for (i in 1:100) dat["1", ])
user system elapsed
12.680 0.396 13.075
> system.time(for (i in 1:100) dat[1, ])
user system elapsed
0.060 0.076 0.137
Good to know!
Thanks a lot,
H.
More information about the R-devel
mailing list