[Rd] speeding up [.data.frame

Prof Brian Ripley ripley@stats.ox.ac.uk
Sun, 6 Jan 2002 07:52:09 +0000 (GMT)

On Sat, 5 Jan 2002, Warnes, Gregory R wrote:

> (I'm up too late so this might come through garbled...)
> I've just been doing some bootstrapping on data frames and I discovered that
> S-plus 6.0r1 was a *lot* faster than R 1.3.1 at the task.  Splus was
> completing 100 bootstrap iterations in about 4 seconds while R was taking
> about 15 seconds. However, doing bootstrapping on equivalent *matrices* R
> was slightly faster, 1.5 seconds verses 1.86.
> Now, since I'm doing glm's inside the bootstrap, I really need to use data
> frames...

Why?  Surely you should be working at the design matrix level and calling
glm.fit directly?   Otherwise you are repeating a lot of work for every
bootstrap fit.

BTW, is 11 seconds worth saving?: it sound trivial to me.  But if it is,
moving to glm.fit looks to me to be the best optimization.

> It turns out that one of the reasons S-plus is faster on data frames is that
> S-Plus's allows you to turn of checking for/resolution of duplicate row
> names in "[.data.frame" by setting an attribute 'dup.row.names' to any
> non-NULL value.  Adding an additional argument to R's "[.data.frame"  (patch
> below) to permit the same optimization and using the argument in my
> bootstrap function reduced the elapsed time for R to 8.6 seconds.
> Still, I'm wondering if there are other 'reasonable' changes to
> "[.data.frame" that could narrow the gap further...

That one is not reasonable in my opinion.  It should not be in S-PLUS (and
the advisory board has discussed its removal, as I recall).  Having unique
row names is a fundamental property of data frames.  What you and they
seem to want is another class which is like data frames but does not
require row names, from which data.frame could inherit.

Brian D. Ripley,                  ripley@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch