[R] using lapply

Phil Spector spector at stat.berkeley.edu
Thu Mar 10 18:51:53 CET 2011


To add to William's remarks, another advantage of the
apply family of functions is that they avoid growing an
object inside a loop, which is very inefficient in R.
In other words, without the *apply functions, users 
might do something like this:

answer = NULL
for(i in 1:nrows)
    answer = rbind(answer,calculateanewrow(i))

or

answer = NULL
for(i in 1:n)
    answer = c(answer,newcalculation(i))

So in addition to making your program easier to understand
(which is a huge advantange in and of itself), they 
also help you avoid a programming paradigm that's 
very inefficient in R:

> mat = matrix(abs(rnorm(10000)),1000,10)
> system.time({answer=NULL;for(i in 1:nrow(mat))answer = rbind(answer,log(mat[i,]))})
    user  system elapsed
   0.052   0.020   0.072 
> system.time({answer1 = t(apply(mat,1,log))})
    user  system elapsed
   0.012   0.000   0.012 
> all.equal(answer,answer1)
[1] TRUE

That's a speedup of a factor of 6, which gets even bigger as the size
of the object increases:

> mat = matrix(abs(rnorm(100000)),10000,10)
> system.time({answer=NULL;for(i in 1:nrow(mat))answer = rbind(answer,log(mat[i,]))})
    user  system elapsed
   5.960   1.524   7.505 
> system.time({answer1 = t(apply(mat,1,log))})
    user  system elapsed
   0.120   0.004   0.123 
> all.equal(answer,answer1)
[1] TRUE

Now it's a speedup of 60 -- essentially an O(n^2) algorithm competing with
an O(n) algorithm.

The lack of scalability of this paradigm often leads new users to believe that
R can't handle large problems.  Learning to use the apply family of functions
from the start avoids this misconception.

 					- Phil Spector
 					 Statistical Computing Facility
 					 Department of Statistics
 					 UC Berkeley
 					 spector at stat.berkeley.edu




On Thu, 10 Mar 2011, William Dunlap wrote:

>> -----Original Message-----
>> From: r-help-bounces at r-project.org
>> [mailto:r-help-bounces at r-project.org] On Behalf Of
>> rex.dwyer at syngenta.com
>> Sent: Thursday, March 10, 2011 8:47 AM
>> To: ligges at statistik.tu-dortmund.de; arun.kumar.saha at gmail.com
>> Cc: r-help at r-project.org
>> Subject: Re: [R] using lapply
>>
>> But no one answered Kushan's question about performance
>> implications of for-loop vs lapply.
>> With apologies to George Orwell:
>> "for-loops BAAAAAAD, no loops GOOOOOOD."
>
> While using no loops is faster, lapply has
> a loop in it and isn't much different in
> speed from the equvialent for loop.  The big
> advantage of the *apply functions is that
> they can make your code easier to understand.
> Here are some times for various ways of computing
> log(1:1000000).  This example is probably close
> to a worst-case scenario for the for loop, since
> the time is dominated by the [<- operation.
> Using the various *apply functions can get you a
> speed-up of c. 4x, which is nice, but the vectorized
> log gives a speed-up of c. 15x over the fastest of
> the loops.  I think the for-loop method is ungainly
> because it obscures to flow of the data, but there is
> no accounting for taste.
>
>  > system.time({ val.for <- numeric(1e6);for(i in
> seq_len(1e6))val.for[i]<-log(i)})
>     user  system elapsed
>     7.03    0.02    7.19
>  > system.time({ val.sapply <- sapply(seq_len(1e6), log) })
>     user  system elapsed
>     6.59    0.03    6.80
>  > system.time({ val.lapply <- unlist(lapply(seq_len(1e6), log)) })
>     user  system elapsed
>     2.48    0.00    2.52
>  > system.time({ val.vapply <- vapply(seq_len(1e6), log, FUN.VALUE=0)
> })
>     user  system elapsed
>     1.74    0.00    1.76
>  > system.time({ val.log <- log(seq_len(1e6)) })
>     user  system elapsed
>     0.12    0.00    0.12
>  > identical(val.vapply,val.sapply) && identical(val.vapply,val.for) &&
> identical(val.vapply,val.lapply) && identical(val.vapply,val.log)
>  [1] TRUE
>
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com
>
>>
>> -----Original Message-----
>> From: r-help-bounces at r-project.org
>> [mailto:r-help-bounces at r-project.org] On Behalf Of Uwe Ligges
>> Sent: Thursday, March 10, 2011 4:38 AM
>> To: Arun Kumar Saha
>> Cc: r-help at r-project.org
>> Subject: Re: [R] using lapply
>>
>>
>>
>> On 10.03.2011 08:30, Arun Kumar Saha wrote:
>>> On reply to the post
>>> http://r.789695.n4.nabble.com/using-lapply-td3345268.html
>>
>> Hmmm, can you please reply to the original post and quote it?
>> You mail was not recognized to be in the same thread as the message of
>> the original poster (and hence I wasted time to answer it again).
>>
>> Thanks,
>> Uwe Ligges
>>
>>
>>
>>
>>> Dear Kushan, this may be a good start:
>>>
>>> ## assuming 'instr.list' is  your list object and you are applying
>>> my.strat() function on each element of that list, you can use lapply
>>> function as
>>> lapply(instr.list, function(x) return(my.strat(x)))
>>>
>>> Here resulting element will again be another list with
>> length is same as the
>>> length of your original list 'instr.list.'
>>>
>>> Instead if the returned object for my.strat() function is a
>> single number
>>> then you might want to create a vector instead list, in
>> that case just use
>>> 'sapply'
>>>
>>> HTH
>>>
>>> Arun,
>>>
>>>       [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>>
>>
>> message may contain confidential information. If you are not
>> the designated recipient, please notify the sender
>> immediately, and delete the original and any copies. Any use
>> of the message by you is prohibited.
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list