[R] lazy evaluation (was RE: Number of replications of a term)
Liaw, Andy
andy_liaw at merck.com
Wed Jan 25 02:01:01 CET 2006
From: Thomas Lumley
>
> On Wed, 25 Jan 2006, Ray Brownrigg wrote:
>
> > There's an even faster one, which nobody seems to have
> mentioned yet:
> >
> > rep(l <- rle(ids)$lengths, l)
>
> I considered this but it wasn't clear to me from the initial
> post that
> each ID occupied a contiguous section of the vector.
>
> Also, lazy evaluation makes code like this
> rep(l <- rle(ids)$lengths, l)
> a bit worrying. It relies on rep() using the first argument
> before it uses
> the second one. In this case, clearly, it works, but it is
> not a style I
> would encourage and it's easy to construct functions where it fails.
Indeed. Here's a trivial example:
2: package BRmisc in options("defaultPackages") was not found
> f <- function(x, y) {
+ print(y)
+ x + y
+ }
> f(a <- 3, a)
Error in print(y) : object "a" not found
Without the print(), the function would work just fine.
Andy
> -thomas
>
>
>
> > Timing on my 2.8GHz NetBSD system shows:
> >
> >> length(ids)
> > [1] 45150
> >> # Gabor:
> >> system.time(for (i in 1:100) ave(as.numeric(factor(ids)),
> ids, FUN =
> > length))
> > [1] 3.45 0.06 3.54 0.00 0.00
> >> # Barry (and others I think):
> >> system.time(for (i in 1:100) table(ids)[ids])
> > [1] 2.13 0.05 2.20 0.00 0.00
> >> Me:
> >> system.time(for (i in 1:100) rep(l <- rle(ids)$lengths, l))
> > [1] 1.60 0.00 1.62 0.00 0.00
> >
> > Of course the difference between 21 milliseconds and 16
> milliseconds is
> > not great, unless you are doing this a lot.
> >
> > Ray Brownrigg
> >
> >> From: Gabor Grothendieck <ggrothendieck at gmail.com>
> >>
> >> Nice. I timed it and its much faster than mine too.
> >>
> >> On 1/24/06, Barry Rowlingson <B.Rowlingson at lancaster.ac.uk> wrote:
> >>> Laetitia Marisa wrote:
> >>>> Hello,
> >>>>
> >>>> Is there a simple and fast function that returns a
> vector of the number
> >>>> of replications for each object of a vector ?
> >>>> For example :
> >>>> I have a vector of IDs :
> >>>> ids <- c( "ID1", "ID2", "ID2", "ID3", "ID3","ID3", "ID5")
> >>>>
> >>>> I want the function returns the following vector where
> each term is the
> >>>> number of replicates for the given id :
> >>>> c( 1, 2, 2, 3,3,3,1 )
> >>>
> >>> One-liner:
> >>>
> >>> > table(ids)[ids]
> >>> ids
> >>> ID1 ID2 ID2 ID3 ID3 ID3 ID5
> >>> 1 2 2 3 3 3 1
> >>>
> >>> 'table(ids)' computes the counts, then the subscripting
> [ids] looks it
> >>> all up.
> >>>
> >>> Now try it on your 40,000-long vector!
> >>>
> >>> Barry
> >
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
> >
>
> Thomas Lumley Assoc. Professor, Biostatistics
> tlumley at u.washington.edu University of Washington, Seattle
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
>
>
More information about the R-help
mailing list