[R] counting the occurrences of vectors
Marc Schwartz
MSchwartz at MedAnalytics.com
Tue Jul 6 14:54:50 CEST 2004
On Mon, 2004-07-05 at 23:22, Gabor Grothendieck wrote:
> Marc Schwartz <MSchwartz <at> MedAnalytics.com> writes:
>
> > the likely overhead involved in paste()ing together the rows
> > to create objects
>
>
> I thought I would check this and it seems that in my original f1 function
> its not really the paste itself that's the bottleneck but applying the
> paste. If we use do.call rather than apply, as shown in f1a below, then
> we see that f1a runs faster than row.match.count (which in turn was faster
> than f1):
>
> f1a <- function(a,b,sep=":") {
> f <- function(...) paste(..., sep=sep)
> a2 <- do.call("f", as.data.frame(a))
> b2 <- do.call("f", as.data.frame(b))
> c(table(c(b2,unique(a2)))[a2] - 1)
> }
>
> > set.seed(1)
> > # note that we have increased the size of the matrices from last post
> > # to better show the speed difference
> > a <- matrix(sample(3,10000,rep=T),nc=5)
> > b <- matrix(sample(3,1000,rep=T),nc=5)
>
> > # row.match.count taken from Marc's post in this thread
> > # have put a c(...) around row.match.count to make it comparable to f1a
> > gc(); system.time(ans <- c(row.match.count(b,a)))
> used (Mb) gc trigger (Mb)
> Ncells 436079 11.7 741108 19.8
> Vcells 130663 1.0 786432 6.0
> [1] 0.11 0.00 0.11 NA NA
>
> > gc(); system.time(ansf1a <- f1a(b,a))
> used (Mb) gc trigger (Mb)
> Ncells 436080 11.7 741108 19.8
> Vcells 130669 1.0 786432 6.0
> [1] 0.04 0.00 0.04 NA NA
>
> > all.equal(ansf1a,ans)
> [1] TRUE
Gabor,
Well done! I liked your approach in the prior message of getting away
from using regex. I had one of those "I could'a had a V-8" moments, when
I realized that of course the resultant table names were syntactically
correct R statements and therefore one could get away from worrying
about the data type issues and use eval(parse(...)).
The above approach is better yet, more flexible, of course more elegant
and notably faster.
Advantage Gabor... ;-)
Best regards,
Marc
More information about the R-help
mailing list