[R-SIG-Finance] Use apply/lapply/tapply functions
Brian G. Peterson
brian at braverock.com
Wed Sep 3 22:57:05 CEST 2008
as others have already hinted at, you really need to rework the *inside*
of your function to not repeat code that doesn't need to be repeated.
Either apply or a loop might be a rational answer there, but you'll
first need to dissect your code to figure out what can be looped inside
the function, while avoiding repeating stuff that's already been
calculated. For example, many calculations can be vectorized or turned
into matrix operations. Then, pass in a vector of values for p1 and p2,
and have the internals of the function do the work.
Without the actual internals of your function, no one here is going to
be able to speak in more than generalities. If you're replicating a
published paper, perhaps you should consider that the model you are
evaluating is already "in the wild", and everyone could benefit from an
optimized R model for same. If, of course, you're creating a new model
from scratch, then we all understand that you cannot show the internals.
Regards,
- Brian
Jorge Nieves wrote:
> Thanks for your suggestions.
>
> I just started using R recently. I am trying to figure out my way around the system. I tested your suggestions and the speed of "apply" is definitely better that of the "for" loop.
>
> The two function inputs (out of a total five) that I am trying to parameterize (loop trough) are not time dependent. Therefore, I believe I could use some function from the apply family, but I do not know how to set it up. The references in the help do not show how to select ONLY a subset (two in his case) of the variables that go into my function. Say if the function takes in (x1,p1,p2,y1,y2), my problem is to determine how to APPLY p1 and p2 only?
>
>
> What will be the equivalent in the APPLY space to the following for loop code?
>
> for p1 in 1:100
> {
> for p2 in 1:100
> {
> test = myfunction(x1,p1,p2,y1,y2)
> }
> }
>
> Where:
>
> myfunction = function (dataset, p1,p2,y1,y2,y3)
> {
>
> Line1
> Line2
> Line3
> ::::::::
> :::::::
> :::::::
> Return(res.table)
> }
>
> res.table is a n by m matrix
>
> Jorge
>
> -----Original Message-----
> From: Enrico Schumann [mailto:enricoschumann at yahoo.de]
> Sent: Wednesday, September 03, 2008 11:28 AM
> To: 'Rob Steele'; Jorge Nieves
> Cc: r-sig-finance at stat.math.ethz.ch
> Subject: AW: [R-SIG-Finance] Use apply/lapply/tapply functions
>
> i suppose that this is rather an r-help question; however, if you can use a function from the ``apply-family'', it should usually be far faster than a loop.
>
> try
> N <- 100000
> x <- array(0,dim=c(N,1))
>
> set.seed(1284357)
> # loop
> pcm <- proc.time()
> for (i in 1:N){
> x[i] <- rnorm(1)
> }
> p1 <- proc.time()-pcm
>
> set.seed(1284357)
> # apply
> y <- array(0,dim=c(N,1))
> pcm <- proc.time()
> y <- apply(y,2,rnorm)
> p2 <- proc.time()-pcm
>
> # compare time needed
> p1
> p2
> # compare results
> sum(x!=y)
>
>
> but, if you can use apply, then you probably did not really need a loop in the first place, as your procedure is not really sequential (in the sense that the computation in i+1 really required the computation from step i)
>
>
> -----Ursprüngliche Nachricht-----
> Von: r-sig-finance-bounces at stat.math.ethz.ch
> [mailto:r-sig-finance-bounces at stat.math.ethz.ch] Im Auftrag von Rob Steele
> Gesendet: Mittwoch, 3. September 2008 16:52
> An: r-sig-finance at stat.math.ethz.ch
> Betreff: Re: [R-SIG-Finance] Use apply/lapply/tapply functions
>
> The looping functions (apply/lapply/tapply) can make your code cleaner and easier to read but they can't speed it up. For that you need to make the stuff in the loop faster, perhaps by vectorizing parts you're currently doing serially.
>
>
> Jorge Nieves wrote:
>> Hi,
>>
>> I have a function that takes in a dataset ( a matrix of m rows by n
>> columns), and five additional "constant" parameters, p1,p2,y1,y2,y3.
>> The function perform a series of operations and transformations on the
>> dataset, and returns a table of results.
>>
>> I have tested the function repeatedly and it works fine.
>>
>> However, I would like to generate a grid of results from myfunction
>> for different values of two of the input parameters: p1, and p2.
>>
>> I have tried using for loops, and they work, but the computation time
>> is a too long. I would like to use the apply/lapply/tapply functions
>> to avoid using for loops, what ever works !!!
>>
>> Can someone recommend how to use these function to parameterize only a
>> subset of the inputs into the function, i.e p1, and p2?
>>
>> Any tips/recommendations will be appreciated.
>>
>>
>>
>> myfunction = function (dataset, p1,p2,y1,y2,y3) {
>>
>> Line1
>> Line2
>> Line3
>> ::::::::
>> :::::::
>> :::::::
>> Return(res.table)
>> }
>>
>>
>
> _______________________________________________
> R-SIG-Finance at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-finance
> -- Subscriber-posting only.
> -- If you want to post, subscribe first.
> No virus found in this incoming message.
> Checked by AVG - http://www.avg.com
>
> 03.09.2008
> 07:15
>
> _______________________________________________
> R-SIG-Finance at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
> -- Subscriber-posting only.
> -- If you want to post, subscribe first.
More information about the R-SIG-Finance
mailing list