[R-SIG-Finance] Use apply/lapply/tapply functions

Wed Sep 3 22:35:45 CEST 2008

right; sorry, i `applied' on the column, not the rows. 

-----Ursprüngliche Nachricht-----
Von: r-sig-finance-bounces at stat.math.ethz.ch
[mailto:r-sig-finance-bounces at stat.math.ethz.ch] Im Auftrag von Patrick
Burns
Gesendet: Mittwoch, 3. September 2008 21:12
An: Jorge Nieves
Cc: R-sig-finance at stat.math.ethz.ch
Betreff: Re: [R-SIG-Finance] Use apply/lapply/tapply functions

Jorge,

This is not a good example at all.  First off,  using 'system.time' is
easier and better (because it does garbage collection at the start by
default).  Here are timings I get:

 > N <- 100000
 > xf <- array(0, c(N, 1))
 > xa <- array(0, c(N, 1))

 > set.seed(3)
 > system.time(xar2 <- apply(xa, 2, rnorm))
   user  system elapsed
   0.07    0.00    0.06
 > set.seed(3)
 > system.time(xar1 <- apply(xa, 1, rnorm))
   user  system elapsed
   3.23    0.00    3.24
 > set.seed(3)
 > system.time(for(i in 1:N) xf[i] <- rnorm(1))
   user  system elapsed
   2.62    0.00    2.64

The apply on the second dimension is really a vectorized operation -- it is
only calling 'rnorm'
once.  This works because:

 > rnorm(c(0,0,0))
[1] 1.4397447 0.9339142 1.9902305

'rnorm' vectorizes on the length of the first argument if it is of length
greater than one.  Using 'apply' on the first dimension takes longer than
the loop and doesn't really do anything.

The proper 'apply' call would be:

 > xa1 <- array(1, c(N, 1))
 > set.seed(3)
 > system.time(xar1b <- apply(xa1, 1, rnorm))
   user  system elapsed
   4.28    0.01    4.41

which takes longer still.

Patrick Burns
patrick at burns-stat.com
+44 (0)20 8525 0696
http://www.burns-stat.com
(home of S Poetry and "A Guide for the Unwilling S User")

Jorge Nieves wrote:
> Ah  ok
>
> Thanks for your comments Patrick.
>
> I run the example sent to me by Enrico Schumann, and I see quite different
processing times. The question of course is if the processing time escalates
linearly as the "loop size" increases in both scenarios.
>
> Here is Enrico Schumann [enricoschumann at yahoo.de] email:
>
> ##################
>
> i suppose that this is rather an r-help question; however, if you can use
a function from the ``apply-family'', it should usually be far faster than a
loop. 
>
> try
> N <- 100000
> x <- array(0,dim=c(N,1))
>
> set.seed(1284357)
> # loop
> pcm <- proc.time()
> for (i in 1:N){
> x[i] <- rnorm(1)
> }
> p1 <- proc.time()-pcm
>
> set.seed(1284357)
> # apply
> y <- array(0,dim=c(N,1))
> pcm <- proc.time()
> y <- apply(y,2,rnorm)
> p2 <- proc.time()-pcm
>
> # compare time needed
> p1
> p2
> # compare results
> sum(x!=y)
>
>
> but, if you can use apply, then you probably did not really need a 
> loop in the first place, as your procedure is not really sequential 
> (in the sense that the computation in i+1 really required the 
> computation from step i)
>
> ############
>  
>
> -----Original Message-----
> From: Patrick Burns [mailto:patrick at burns-stat.com]
> Sent: Wednesday, September 03, 2008 01:48 PM
> To: Jorge Nieves
> Subject: Re: [R-SIG-Finance] Use apply/lapply/tapply functions
>
> Jorge,
>
> I'd be quite interested to see an example where 'apply' is significantly
faster than the corresponding 'for' loop because I don't think that it is
possible.
> 'apply' has a 'for' loop in its definition.
>
> S Poetry may be useful to you in learning R.
>
> Patrick Burns
> patrick at burns-stat.com
> +44 (0)20 8525 0696
> http://www.burns-stat.com
> (home of S Poetry and "A Guide for the Unwilling S User")
>
> Jorge Nieves wrote:
>   
>> Thanks for your suggestions.
>>
>> I just started using R recently. I am trying to figure out my way around
the system. I tested your suggestions and the speed of "apply" is definitely
better that of the "for" loop.
>>
>> The two function inputs (out of a total five) that I am trying to
parameterize (loop trough) are not time dependent. Therefore, I believe I
could use some function from the apply family, but I do not know how to set
it up. The references in the help do not show how to select ONLY a subset
(two in his case) of the variables that go into my function. Say if the
function takes in (x1,p1,p2,y1,y2), my problem is to determine how to APPLY
p1 and p2 only?
>>
>>
>> What will be the equivalent in the APPLY space to the following for loop
code?
>>
>> for p1 in 1:100
>>  {
>>   for p2 in 1:100
>>   {
>>    test = myfunction(x1,p1,p2,y1,y2)
>>   }
>> }
>>
>> Where:
>>
>> myfunction = function (dataset, p1,p2,y1,y2,y3) {
>>
>> Line1
>> Line2
>> Line3
>> ::::::::
>> :::::::
>> :::::::
>> Return(res.table)
>> }
>>  
>> res.table is a n by m matrix
>>
>> Jorge
>>
>> -----Original Message-----
>> From: Enrico Schumann [mailto:enricoschumann at yahoo.de]
>> Sent: Wednesday, September 03, 2008 11:28 AM
>> To: 'Rob Steele'; Jorge Nieves
>> Cc: r-sig-finance at stat.math.ethz.ch
>> Subject: AW: [R-SIG-Finance] Use apply/lapply/tapply functions
>>
>> i suppose that this is rather an r-help question; however, if you can use
a function from the ``apply-family'', it should usually be far faster than a
loop. 
>>
>> try
>> N <- 100000
>> x <- array(0,dim=c(N,1))
>>
>> set.seed(1284357)
>> # loop
>> pcm <- proc.time()
>> for (i in 1:N){
>> x[i] <- rnorm(1)
>> }
>> p1 <- proc.time()-pcm
>>
>> set.seed(1284357)
>> # apply
>> y <- array(0,dim=c(N,1))
>> pcm <- proc.time()
>> y <- apply(y,2,rnorm)
>> p2 <- proc.time()-pcm
>>
>> # compare time needed
>> p1
>> p2
>> # compare results
>> sum(x!=y)
>>
>>
>> but, if you can use apply, then you probably did not really need a 
>> loop in the first place, as your procedure is not really sequential 
>> (in the sense that the computation in i+1 really required the 
>> computation from step i)
>>
>>
>> -----Ursprüngliche Nachricht-----
>> Von: r-sig-finance-bounces at stat.math.ethz.ch
>> [mailto:r-sig-finance-bounces at stat.math.ethz.ch] Im Auftrag von Rob 
>> Steele
>> Gesendet: Mittwoch, 3. September 2008 16:52
>> An: r-sig-finance at stat.math.ethz.ch
>> Betreff: Re: [R-SIG-Finance] Use apply/lapply/tapply functions
>>
>> The looping functions (apply/lapply/tapply) can make your code cleaner
and easier to read but they can't speed it up.  For that you need to make
the stuff in the loop faster, perhaps by vectorizing parts you're currently
doing serially.
>>
>>
>> Jorge Nieves wrote:
>>   
>>     
>>> Hi,
>>>
>>> I have  a function that takes in a dataset ( a matrix of m rows by n 
>>> columns), and five additional "constant" parameters, p1,p2,y1,y2,y3.
>>> The function perform a series of operations and transformations on 
>>> the dataset, and returns a table of results.
>>>
>>> I have tested the function repeatedly and it works fine.
>>>
>>> However, I would like to generate a grid of results from myfunction 
>>> for different values of two of the input parameters: p1, and p2.
>>>
>>> I have tried using for loops, and they work, but the computation 
>>> time is a too long.  I would like to use the apply/lapply/tapply 
>>> functions to avoid using for loops, what ever works !!!
>>>
>>> Can someone recommend how to use these function to parameterize only 
>>> a subset of the inputs into the function, i.e p1, and p2?
>>>
>>> Any tips/recommendations will be appreciated.
>>>
>>>
>>>
>>> myfunction = function (dataset, p1,p2,y1,y2,y3) {
>>>
>>> Line1
>>> Line2
>>> Line3
>>> ::::::::
>>> :::::::
>>> :::::::
>>> Return(res.table)
>>> }
>>>
>>>
>>>     
>>>       
>> _______________________________________________
>> R-SIG-Finance at stat.math.ethz.ch mailing list 
>> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
>> -- Subscriber-posting only.
>> -- If you want to post, subscribe first.
>> No virus found in this incoming message.
>> Checked by AVG - http://www.avg.com
>>
>> 03.09.2008
>> 07:15
>>
>> _______________________________________________
>> R-SIG-Finance at stat.math.ethz.ch mailing list 
>> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
>> -- Subscriber-posting only.
>> -- If you want to post, subscribe first.
>>
>>
>>   
>>     
>
>
>

_______________________________________________
R-SIG-Finance at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only.
-- If you want to post, subscribe first.
No virus found in this incoming message.
Checked by AVG - http://www.avg.com

03.09.2008
07:15