[Rd] speeding up [.data.frame
Warnes, Gregory R
gregory_r_warnes@groton.pfizer.com
Sat, 5 Jan 2002 23:08:58 -0500
(I'm up too late so this might come through garbled...)
I've just been doing some bootstrapping on data frames and I discovered that
S-plus 6.0r1 was a *lot* faster than R 1.3.1 at the task. Splus was
completing 100 bootstrap iterations in about 4 seconds while R was taking
about 15 seconds. However, doing bootstrapping on equivalent *matrices* R
was slightly faster, 1.5 seconds verses 1.86.
Now, since I'm doing glm's inside the bootstrap, I really need to use data
frames...
It turns out that one of the reasons S-plus is faster on data frames is that
S-Plus's allows you to turn of checking for/resolution of duplicate row
names in "[.data.frame" by setting an attribute 'dup.row.names' to any
non-NULL value. Adding an additional argument to R's "[.data.frame" (patch
below) to permit the same optimization and using the argument in my
bootstrap function reduced the elapsed time for R to 8.6 seconds.
Still, I'm wondering if there are other 'reasonable' changes to
"[.data.frame" that could narrow the gap further...
-Greg
################ PATCH STARTS HERE #################3
diff -c /Volumes/app/R/src/R-1.4.0/src/library/base/R/dataframe.R.orig
/Volumes/app/R/src/R-1.4.0/src/library/base/R/dataframe.R
*** /Volumes/app/R/src/R-1.4.0/src/library/base/R/dataframe.R.orig Sat
Jan 5 22:58:10 2002
--- /Volumes/app/R/src/R-1.4.0/src/library/base/R/dataframe.R Sat Jan 5
22:58:10 2002
***************
*** 323,329 ****
### These are a little less general than S
"[.data.frame" <-
! function(x, i, j, drop = if(missing(i)) TRUE else length(cols) == 1)
{
if(nargs() < 3) {
if(missing(i))
--- 323,330 ----
### These are a little less general than S
"[.data.frame" <-
! function(x, i, j, drop = if(missing(i)) TRUE else length(cols) == 1,
! dup.row.names=F)
{
if(nargs() < 3) {
if(missing(i))
***************
*** 390,396 ****
}
if(!drop) {
names(x) <- cols
! if(any(duplicated(rows)))
rows <- make.names(rows, unique = TRUE)
attr(x, "row.names") <- rows
class(x) <- cl
--- 391,397 ----
}
if(!drop) {
names(x) <- cols
! if (any(duplicated(rows)) && !dup.row.names)
rows <- make.names(rows, unique = TRUE)
attr(x, "row.names") <- rows
class(x) <- cl
LEGAL NOTICE
Unless expressly stated otherwise, this message is confidential and may be privileged. It is intended for the addressee(s) only. Access to this E-mail by anyone else is unauthorized. If you are not an addressee, any disclosure or copying of the contents of this E-mail or any action taken (or not taken) in reliance on it is unauthorized and may be unlawful. If you are not an addressee, please inform the sender immediately.
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._