[R] using lapply
Phil Spector
spector at stat.berkeley.edu
Thu Mar 10 18:51:53 CET 2011
To add to William's remarks, another advantage of the
apply family of functions is that they avoid growing an
object inside a loop, which is very inefficient in R.
In other words, without the *apply functions, users
might do something like this:
answer = NULL
for(i in 1:nrows)
answer = rbind(answer,calculateanewrow(i))
or
answer = NULL
for(i in 1:n)
answer = c(answer,newcalculation(i))
So in addition to making your program easier to understand
(which is a huge advantange in and of itself), they
also help you avoid a programming paradigm that's
very inefficient in R:
> mat = matrix(abs(rnorm(10000)),1000,10)
> system.time({answer=NULL;for(i in 1:nrow(mat))answer = rbind(answer,log(mat[i,]))})
user system elapsed
0.052 0.020 0.072
> system.time({answer1 = t(apply(mat,1,log))})
user system elapsed
0.012 0.000 0.012
> all.equal(answer,answer1)
[1] TRUE
That's a speedup of a factor of 6, which gets even bigger as the size
of the object increases:
> mat = matrix(abs(rnorm(100000)),10000,10)
> system.time({answer=NULL;for(i in 1:nrow(mat))answer = rbind(answer,log(mat[i,]))})
user system elapsed
5.960 1.524 7.505
> system.time({answer1 = t(apply(mat,1,log))})
user system elapsed
0.120 0.004 0.123
> all.equal(answer,answer1)
[1] TRUE
Now it's a speedup of 60 -- essentially an O(n^2) algorithm competing with
an O(n) algorithm.
The lack of scalability of this paradigm often leads new users to believe that
R can't handle large problems. Learning to use the apply family of functions
from the start avoids this misconception.
- Phil Spector
Statistical Computing Facility
Department of Statistics
UC Berkeley
spector at stat.berkeley.edu
On Thu, 10 Mar 2011, William Dunlap wrote:
>> -----Original Message-----
>> From: r-help-bounces at r-project.org
>> [mailto:r-help-bounces at r-project.org] On Behalf Of
>> rex.dwyer at syngenta.com
>> Sent: Thursday, March 10, 2011 8:47 AM
>> To: ligges at statistik.tu-dortmund.de; arun.kumar.saha at gmail.com
>> Cc: r-help at r-project.org
>> Subject: Re: [R] using lapply
>>
>> But no one answered Kushan's question about performance
>> implications of for-loop vs lapply.
>> With apologies to George Orwell:
>> "for-loops BAAAAAAD, no loops GOOOOOOD."
>
> While using no loops is faster, lapply has
> a loop in it and isn't much different in
> speed from the equvialent for loop. The big
> advantage of the *apply functions is that
> they can make your code easier to understand.
> Here are some times for various ways of computing
> log(1:1000000). This example is probably close
> to a worst-case scenario for the for loop, since
> the time is dominated by the [<- operation.
> Using the various *apply functions can get you a
> speed-up of c. 4x, which is nice, but the vectorized
> log gives a speed-up of c. 15x over the fastest of
> the loops. I think the for-loop method is ungainly
> because it obscures to flow of the data, but there is
> no accounting for taste.
>
> > system.time({ val.for <- numeric(1e6);for(i in
> seq_len(1e6))val.for[i]<-log(i)})
> user system elapsed
> 7.03 0.02 7.19
> > system.time({ val.sapply <- sapply(seq_len(1e6), log) })
> user system elapsed
> 6.59 0.03 6.80
> > system.time({ val.lapply <- unlist(lapply(seq_len(1e6), log)) })
> user system elapsed
> 2.48 0.00 2.52
> > system.time({ val.vapply <- vapply(seq_len(1e6), log, FUN.VALUE=0)
> })
> user system elapsed
> 1.74 0.00 1.76
> > system.time({ val.log <- log(seq_len(1e6)) })
> user system elapsed
> 0.12 0.00 0.12
> > identical(val.vapply,val.sapply) && identical(val.vapply,val.for) &&
> identical(val.vapply,val.lapply) && identical(val.vapply,val.log)
> [1] TRUE
>
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com
>
>>
>> -----Original Message-----
>> From: r-help-bounces at r-project.org
>> [mailto:r-help-bounces at r-project.org] On Behalf Of Uwe Ligges
>> Sent: Thursday, March 10, 2011 4:38 AM
>> To: Arun Kumar Saha
>> Cc: r-help at r-project.org
>> Subject: Re: [R] using lapply
>>
>>
>>
>> On 10.03.2011 08:30, Arun Kumar Saha wrote:
>>> On reply to the post
>>> http://r.789695.n4.nabble.com/using-lapply-td3345268.html
>>
>> Hmmm, can you please reply to the original post and quote it?
>> You mail was not recognized to be in the same thread as the message of
>> the original poster (and hence I wasted time to answer it again).
>>
>> Thanks,
>> Uwe Ligges
>>
>>
>>
>>
>>> Dear Kushan, this may be a good start:
>>>
>>> ## assuming 'instr.list' is your list object and you are applying
>>> my.strat() function on each element of that list, you can use lapply
>>> function as
>>> lapply(instr.list, function(x) return(my.strat(x)))
>>>
>>> Here resulting element will again be another list with
>> length is same as the
>>> length of your original list 'instr.list.'
>>>
>>> Instead if the returned object for my.strat() function is a
>> single number
>>> then you might want to create a vector instead list, in
>> that case just use
>>> 'sapply'
>>>
>>> HTH
>>>
>>> Arun,
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>>
>>
>> message may contain confidential information. If you are not
>> the designated recipient, please notify the sender
>> immediately, and delete the original and any copies. Any use
>> of the message by you is prohibited.
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list