tabulate
Prof Brian D Ripley
ripley@stats.ox.ac.uk
Tue, 25 Jan 2000 07:04:31 +0000 (GMT)
On 25 Jan 2000, Peter Dalgaard BSA wrote:
> Bill Venables <William.Venables@cmis.CSIRO.AU> writes:
>
> > OK Peter. This is the first one I cooked up:
> ...
> > > m <- rpois(100000, 1)
> > > tabulate(m)
> > [1] 36891 18399 6064 1519 309 50 4 1
> > > table(m)
> > m
> > 0 1 2 3 4 5 6 7 8
> > 36763 36891 18399 6064 1519 309 50 4 1
> > > system.time(tabulate(m))
> > [1] 0.11 0.00 0.00 0.00 0.00
> > > system.time(table(m))
> > [1] 2.90 0.16 4.00 0.00 0.00
> > > version
>
> OK first, notice that I get:
>
> > system.time(table(m))
> [1] 3.38 0.00 3.38 0.00 0.00
> > system.time(f<-factor(m))
> [1] 2.12 0.00 2.12 0.00 0.00
> > system.time(table(f))
> [1] 1.19 0.00 1.20 0.00 0.00
>
> so most of the time really goes into factor(). If one is careful about
> the innards of table() one can shave the time for that to
>
> > system.time(tab2(f))
> [1] 0.66 0.01 0.67 0.00 0.00
>
> Rather interestingly, the non constant time part of table would seem
> equivalent to
>
> > system.time(as.integer(0)+as.integer(1)*(as.integer(f)-as.integer(1)))
> [1] 0.25 0.00 0.25 0.00 0.00
> > system.time(as.integer(0)+as.integer(1)*(as.integer(f)-as.integer(1)))
> [1] 0.07 0.00 0.07 0.00 0.00
>
> Notice the huge difference in the two executions, indicating that the
> number of garbage collections involved probably play a major role.
>
> On the whole it doesn't really seem to be worth it to obtimize this
> very heavily, but if you have any obvious improvements for factor()...
Almost all the time is going on match(), which is 25x slower on my system
than S-PLUS 5.1 for this example. I recall that we see this problem
in slow model.* manipulations too.
I don't know why match is slow: it does use hashing but may not be
optimized for matches into small sets.
--
Brian D. Ripley, ripley@stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272860 (secr)
Oxford OX1 3TG, UK Fax: +44 1865 272595
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._