tabulate
Peter Dalgaard BSA
p.dalgaard@biostat.ku.dk
25 Jan 2000 00:24:22 +0100
Bill Venables <William.Venables@cmis.CSIRO.AU> writes:
> OK Peter. This is the first one I cooked up:
...
> > m <- rpois(100000, 1)
> > tabulate(m)
> [1] 36891 18399 6064 1519 309 50 4 1
> > table(m)
> m
> 0 1 2 3 4 5 6 7 8
> 36763 36891 18399 6064 1519 309 50 4 1
> > system.time(tabulate(m))
> [1] 0.11 0.00 0.00 0.00 0.00
> > system.time(table(m))
> [1] 2.90 0.16 4.00 0.00 0.00
> > version
OK first, notice that I get:
> system.time(table(m))
[1] 3.38 0.00 3.38 0.00 0.00
> system.time(f<-factor(m))
[1] 2.12 0.00 2.12 0.00 0.00
> system.time(table(f))
[1] 1.19 0.00 1.20 0.00 0.00
so most of the time really goes into factor(). If one is careful about
the innards of table() one can shave the time for that to
> system.time(tab2(f))
[1] 0.66 0.01 0.67 0.00 0.00
Rather interestingly, the non constant time part of table would seem
equivalent to
> system.time(as.integer(0)+as.integer(1)*(as.integer(f)-as.integer(1)))
[1] 0.25 0.00 0.25 0.00 0.00
> system.time(as.integer(0)+as.integer(1)*(as.integer(f)-as.integer(1)))
[1] 0.07 0.00 0.07 0.00 0.00
Notice the huge difference in the two executions, indicating that the
number of garbage collections involved probably play a major role.
On the whole it doesn't really seem to be worth it to obtimize this
very heavily, but if you have any obvious improvements for factor()...
--
O__ ---- Peter Dalgaard Blegdamsvej 3
c/ /'_ --- Dept. of Biostatistics 2200 Cph. N
(*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard@biostat.ku.dk) FAX: (+45) 35327907
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._