[R] counting the occurrences of vectors

Gabor Grothendieck ggrothendieck at myway.com
Tue Jul 6 06:22:22 CEST 2004


Marc Schwartz <MSchwartz <at> MedAnalytics.com> writes:

> the likely overhead involved in paste()ing together the rows
> to create objects 


I thought I would check this and it seems that in my original f1 function 
its not really the paste itself that's the bottleneck but applying the 
paste.  If we use do.call rather than apply, as shown in f1a below, then 
we see that f1a runs faster than row.match.count (which in turn was faster
than f1):

f1a <- function(a,b,sep=":") {
	f <- function(...) paste(..., sep=sep)
	a2 <- do.call("f", as.data.frame(a))
	b2 <- do.call("f", as.data.frame(b))
	c(table(c(b2,unique(a2)))[a2] - 1)
}

> set.seed(1)
> # note that we have increased the size of the matrices from last post
> # to better show the speed difference
> a <- matrix(sample(3,10000,rep=T),nc=5)
> b <- matrix(sample(3,1000,rep=T),nc=5)

> # row.match.count taken from Marc's post in this thread
> # have put a c(...) around row.match.count to make it comparable to f1a
> gc(); system.time(ans <- c(row.match.count(b,a)))
         used (Mb) gc trigger (Mb)
Ncells 436079 11.7     741108 19.8
Vcells 130663  1.0     786432  6.0
[1] 0.11 0.00 0.11   NA   NA

> gc(); system.time(ansf1a <- f1a(b,a))
         used (Mb) gc trigger (Mb)
Ncells 436080 11.7     741108 19.8
Vcells 130669  1.0     786432  6.0
[1] 0.04 0.00 0.04   NA   NA

> all.equal(ansf1a,ans)
[1] TRUE
>




More information about the R-help mailing list