[R] lazy evaluation (was RE: Number of replications of a term)

Liaw, Andy andy_liaw at merck.com
Wed Jan 25 02:01:01 CET 2006


From: Thomas Lumley
> 
> On Wed, 25 Jan 2006, Ray Brownrigg wrote:
> 
> > There's an even faster one, which nobody seems to have 
> mentioned yet:
> >
> > rep(l <- rle(ids)$lengths, l)
> 
> I considered this but it wasn't clear to me from the initial 
> post that 
> each ID occupied a contiguous section of the vector.
> 
> Also, lazy evaluation makes code like this
>     rep(l <- rle(ids)$lengths, l)
> a bit worrying. It relies on rep() using the first argument 
> before it uses 
> the second one.  In this case, clearly, it works, but it is 
> not a style I 
> would encourage and it's easy to construct functions where it fails.

Indeed.  Here's a trivial example:

2: package BRmisc in options("defaultPackages") was not found 
> f <- function(x, y) {
+     print(y)
+     x + y
+ }
> f(a <- 3, a)
Error in print(y) : object "a" not found

Without the print(), the function would work just fine.

Andy
 
>  	-thomas
> 
> 
> 
> > Timing on my 2.8GHz NetBSD system shows:
> >
> >> length(ids)
> > [1] 45150
> >> # Gabor:
> >> system.time(for (i in 1:100) ave(as.numeric(factor(ids)), 
> ids, FUN =
> > length))
> > [1] 3.45 0.06 3.54 0.00 0.00
> >> # Barry (and others I think):
> >> system.time(for (i in 1:100) table(ids)[ids])
> > [1] 2.13 0.05 2.20 0.00 0.00
> >> Me:
> >> system.time(for (i in 1:100) rep(l <- rle(ids)$lengths, l))
> > [1] 1.60 0.00 1.62 0.00 0.00
> >
> > Of course the difference between 21 milliseconds and 16 
> milliseconds is
> > not great, unless you are doing this a lot.
> >
> > Ray Brownrigg
> >
> >> From: Gabor Grothendieck <ggrothendieck at gmail.com>
> >>
> >> Nice.  I timed it and its much faster than mine too.
> >>
> >> On 1/24/06, Barry Rowlingson <B.Rowlingson at lancaster.ac.uk> wrote:
> >>> Laetitia Marisa wrote:
> >>>> Hello,
> >>>>
> >>>> Is there a simple and fast function that returns a 
> vector of the number
> >>>> of replications for each object of a vector ?
> >>>> For example :
> >>>> I have a vector of IDs :
> >>>> ids <- c( "ID1", "ID2", "ID2", "ID3", "ID3","ID3", "ID5")
> >>>>
> >>>>  I want the function returns the following vector where 
> each term is the
> >>>> number of replicates for the given id :
> >>>> c( 1, 2, 2, 3,3,3,1 )
> >>>
> >>> One-liner:
> >>>
> >>> > table(ids)[ids]
> >>> ids
> >>> ID1 ID2 ID2 ID3 ID3 ID3 ID5
> >>>   1   2   2   3   3   3   1
> >>>
> >>>  'table(ids)' computes the counts, then the subscripting 
> [ids] looks it
> >>> all up.
> >>>
> >>>  Now try it on your 40,000-long vector!
> >>>
> >>> Barry
> >
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
> >
> 
> Thomas Lumley			Assoc. Professor, Biostatistics
> tlumley at u.washington.edu	University of Washington, Seattle
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
> 
>




More information about the R-help mailing list