[R-SIG-Finance] Use apply/lapply/tapply functions

Patrick Burns patrick at burns-stat.com
Wed Sep 3 21:12:23 CEST 2008


Jorge,

This is not a good example at all.  First off,  using
'system.time' is easier and better (because it does
garbage collection at the start by default).  Here
are timings I get:

 > N <- 100000
 > xf <- array(0, c(N, 1))
 > xa <- array(0, c(N, 1))

 > set.seed(3)
 > system.time(xar2 <- apply(xa, 2, rnorm))
   user  system elapsed
   0.07    0.00    0.06
 > set.seed(3)
 > system.time(xar1 <- apply(xa, 1, rnorm))
   user  system elapsed
   3.23    0.00    3.24
 > set.seed(3)
 > system.time(for(i in 1:N) xf[i] <- rnorm(1))
   user  system elapsed
   2.62    0.00    2.64

The apply on the second dimension is really a
vectorized operation -- it is only calling 'rnorm'
once.  This works because:

 > rnorm(c(0,0,0))
[1] 1.4397447 0.9339142 1.9902305

'rnorm' vectorizes on the length of the first argument
if it is of length greater than one.  Using 'apply' on the
first dimension takes longer than the loop and doesn't
really do anything.

The proper 'apply' call would be:

 > xa1 <- array(1, c(N, 1))
 > set.seed(3)
 > system.time(xar1b <- apply(xa1, 1, rnorm))
   user  system elapsed
   4.28    0.01    4.41

which takes longer still.

Patrick Burns
patrick at burns-stat.com
+44 (0)20 8525 0696
http://www.burns-stat.com
(home of S Poetry and "A Guide for the Unwilling S User")

Jorge Nieves wrote:
> Ah  ok
>
> Thanks for your comments Patrick.
>
> I run the example sent to me by Enrico Schumann, and I see quite different processing times. The question of course is if the processing time escalates linearly as the "loop size" increases in both scenarios.
>
> Here is Enrico Schumann [enricoschumann at yahoo.de] email:
>
> ##################
>
> i suppose that this is rather an r-help question; however, if you can use a function from the ``apply-family'', it should usually be far faster than a loop. 
>
> try
> N <- 100000
> x <- array(0,dim=c(N,1))
>
> set.seed(1284357)
> # loop
> pcm <- proc.time()
> for (i in 1:N){
> x[i] <- rnorm(1)
> }
> p1 <- proc.time()-pcm
>
> set.seed(1284357)
> # apply
> y <- array(0,dim=c(N,1))
> pcm <- proc.time()
> y <- apply(y,2,rnorm)
> p2 <- proc.time()-pcm
>
> # compare time needed
> p1
> p2
> # compare results
> sum(x!=y)
>
>
> but, if you can use apply, then you probably did not really need a loop in the first place, as your procedure is not really sequential (in the sense that the computation in i+1 really required the computation from step i)
>
> ############
>  
>
> -----Original Message-----
> From: Patrick Burns [mailto:patrick at burns-stat.com] 
> Sent: Wednesday, September 03, 2008 01:48 PM
> To: Jorge Nieves
> Subject: Re: [R-SIG-Finance] Use apply/lapply/tapply functions
>
> Jorge,
>
> I'd be quite interested to see an example where 'apply' is significantly faster than the corresponding 'for' loop because I don't think that it is possible.
> 'apply' has a 'for' loop in its definition.
>
> S Poetry may be useful to you in learning R.
>
> Patrick Burns
> patrick at burns-stat.com
> +44 (0)20 8525 0696
> http://www.burns-stat.com
> (home of S Poetry and "A Guide for the Unwilling S User")
>
> Jorge Nieves wrote:
>   
>> Thanks for your suggestions.
>>
>> I just started using R recently. I am trying to figure out my way around the system. I tested your suggestions and the speed of "apply" is definitely better that of the "for" loop.
>>
>> The two function inputs (out of a total five) that I am trying to parameterize (loop trough) are not time dependent. Therefore, I believe I could use some function from the apply family, but I do not know how to set it up. The references in the help do not show how to select ONLY a subset (two in his case) of the variables that go into my function. Say if the function takes in (x1,p1,p2,y1,y2), my problem is to determine how to APPLY p1 and p2 only?
>>
>>
>> What will be the equivalent in the APPLY space to the following for loop code?
>>
>> for p1 in 1:100
>>  {
>>   for p2 in 1:100
>>   {
>>    test = myfunction(x1,p1,p2,y1,y2)
>>   }
>> }
>>
>> Where:
>>
>> myfunction = function (dataset, p1,p2,y1,y2,y3) {
>>
>> Line1
>> Line2
>> Line3
>> ::::::::
>> :::::::
>> :::::::
>> Return(res.table)
>> }
>>  
>> res.table is a n by m matrix
>>
>> Jorge
>>
>> -----Original Message-----
>> From: Enrico Schumann [mailto:enricoschumann at yahoo.de]
>> Sent: Wednesday, September 03, 2008 11:28 AM
>> To: 'Rob Steele'; Jorge Nieves
>> Cc: r-sig-finance at stat.math.ethz.ch
>> Subject: AW: [R-SIG-Finance] Use apply/lapply/tapply functions
>>
>> i suppose that this is rather an r-help question; however, if you can use a function from the ``apply-family'', it should usually be far faster than a loop. 
>>
>> try
>> N <- 100000
>> x <- array(0,dim=c(N,1))
>>
>> set.seed(1284357)
>> # loop
>> pcm <- proc.time()
>> for (i in 1:N){
>> x[i] <- rnorm(1)
>> }
>> p1 <- proc.time()-pcm
>>
>> set.seed(1284357)
>> # apply
>> y <- array(0,dim=c(N,1))
>> pcm <- proc.time()
>> y <- apply(y,2,rnorm)
>> p2 <- proc.time()-pcm
>>
>> # compare time needed
>> p1
>> p2
>> # compare results
>> sum(x!=y)
>>
>>
>> but, if you can use apply, then you probably did not really need a 
>> loop in the first place, as your procedure is not really sequential 
>> (in the sense that the computation in i+1 really required the 
>> computation from step i)
>>
>>
>> -----Ursprüngliche Nachricht-----
>> Von: r-sig-finance-bounces at stat.math.ethz.ch
>> [mailto:r-sig-finance-bounces at stat.math.ethz.ch] Im Auftrag von Rob 
>> Steele
>> Gesendet: Mittwoch, 3. September 2008 16:52
>> An: r-sig-finance at stat.math.ethz.ch
>> Betreff: Re: [R-SIG-Finance] Use apply/lapply/tapply functions
>>
>> The looping functions (apply/lapply/tapply) can make your code cleaner and easier to read but they can't speed it up.  For that you need to make the stuff in the loop faster, perhaps by vectorizing parts you're currently doing serially.
>>
>>
>> Jorge Nieves wrote:
>>   
>>     
>>> Hi,
>>>
>>> I have  a function that takes in a dataset ( a matrix of m rows by n 
>>> columns), and five additional "constant" parameters, p1,p2,y1,y2,y3.
>>> The function perform a series of operations and transformations on 
>>> the dataset, and returns a table of results.
>>>
>>> I have tested the function repeatedly and it works fine.
>>>
>>> However, I would like to generate a grid of results from myfunction 
>>> for different values of two of the input parameters: p1, and p2.
>>>
>>> I have tried using for loops, and they work, but the computation time 
>>> is a too long.  I would like to use the apply/lapply/tapply functions 
>>> to avoid using for loops, what ever works !!!
>>>
>>> Can someone recommend how to use these function to parameterize only 
>>> a subset of the inputs into the function, i.e p1, and p2?
>>>
>>> Any tips/recommendations will be appreciated.
>>>
>>>
>>>
>>> myfunction = function (dataset, p1,p2,y1,y2,y3) {
>>>
>>> Line1
>>> Line2
>>> Line3
>>> ::::::::
>>> :::::::
>>> :::::::
>>> Return(res.table)
>>> }
>>>
>>>
>>>     
>>>       
>> _______________________________________________
>> R-SIG-Finance at stat.math.ethz.ch mailing list 
>> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
>> -- Subscriber-posting only.
>> -- If you want to post, subscribe first.
>> No virus found in this incoming message.
>> Checked by AVG - http://www.avg.com
>>
>> 03.09.2008
>> 07:15
>>
>> _______________________________________________
>> R-SIG-Finance at stat.math.ethz.ch mailing list 
>> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
>> -- Subscriber-posting only.
>> -- If you want to post, subscribe first.
>>
>>
>>   
>>     
>
>
>



More information about the R-SIG-Finance mailing list