[R] Alternatives for explicit for() loops
Boris Steipe
boris.steipe at utoronto.ca
Mon Nov 9 00:32:00 CET 2015
While I fully agree with Jim's comments, you may also need to understand the notion of time complexity in algorithm analysis. All the mentioned speed-ups are basically linear, in the sense that they accelerate a single step of your algorithm. However if your algorithm has combinatorial complexity, any amount of linear speed-up may only increase the tractable problem size by a trivial amount. I had proposed earlier that you write a small set of test examples and properly analyze the complexity - how does run time scale with your problem size. That gives you a rational basis to analyze
- how the improvements in your code affect the run time
- how the improvements in your code affect the tractable problem size.
If this shows you that the problem won't go away, then some thoughts about optimization for computationally hard problems will be in order. These may include substituting a heuristic for an exact algorithm, or approximating solutions by stochastic sampling in large spaces.
B.
PS: and for real now: format your code, add comments, don't post in HTML, create an MWE ...
On Nov 8, 2015, at 6:13 PM, jim holtman <jholtman at gmail.com> wrote:
> You need to take a close look at the function incomb that you are
> creating. I see what appears to be a constant value ("*(
> gamma((1/beta)+1))*((alpha)^(-(1/beta)))") being computed that you might
> only have to compute once before the function. You are also referencing
> many variables (m, LED, j, ...) that are not being passed in that are in
> the global environment; it is probably better to pass them in to the
> function. I am not sure what 'pbapply' is doing for you since I see this
> is new to the code that you first sent out.
>
> I would be good it you told us what the function is trying to do; you are
> showing us how you want to do it, not what you want to do. Are there other
> ways of doing it? If speed is your problem, then consider the "Rcpp"
> package and write the function is C++ which might be faster, but again,
> take a look at what you are doing to see if there are other ways. I don't
> have time to dig into the code, since there is a lack of comments, to
> understand why you are using, e.g., 'choose', 'prod', etc.). There are
> probably a lot of ways of speeding up the code, if could tell us what you
> want to accomplish.
>
>
> Jim Holtman
> Data Munger Guru
>
> What is the problem that you are trying to solve?
> Tell me what you want to do, not how you want to do it.
>
> On Sun, Nov 8, 2015 at 4:48 AM, Maram SAlem <marammagdysalem at gmail.com>
> wrote:
>
>> Thanks all for replying.
>>
>> In fact I've used the the Rprof() function and found out that the
>> incomb() function (in my code above) takes about 80% of the time, but I
>> didn't figure out which part of the function is causing the delay. So I
>> thought that this may be due to the for() loops.
>> I MUST run this code for rather large values of n and m, so is there any
>> way that can help me do that without having to wait for more than three
>> days to reach an output. N.B. I'll have to repeat these runs for may be 70
>> or 80 times , and this means HUGE time
>>
>> I'd appreciate any sort of help.
>> Thanks in advance.
>>
>> Maram Salem
>>
>> On 6 November 2015 at 16:54, jim holtman <jholtman at gmail.com> wrote:
>>
>>> If you have code that is running for a long time, then take a small case
>>> that only runs for 5-10 minutes and turn on the RProfiler so that you can
>>> see where you are spending your time. In most cases, it is probably not
>>> the 'for' loops that are causing the problem, but some function/calculation
>>> you are doing within the loop that is consuming the time, and until you
>>> determine what section of code that is, is it hard to tell exactly what the
>>> problem is, much less the solution.
>>>
>>>
>>> Jim Holtman
>>> Data Munger Guru
>>>
>>> What is the problem that you are trying to solve?
>>> Tell me what you want to do, not how you want to do it.
>>>
>>> On Wed, Nov 4, 2015 at 9:09 AM, Maram SAlem <marammagdysalem at gmail.com>
>>> wrote:
>>>
>>>> Hi Jim,
>>>>
>>>> Thanks a lot for replying.
>>>>
>>>> In fact I'm trying to run a simulation study that enables me to
>>>> calculate the Bayes risk of a sampling plan selected from progressively
>>>> type-II censored Weibull model. One of the steps involves evaluating the
>>>> expected test time, which is a rather complicated formula that involves
>>>> nested multiple summations where the counters of the summation signs are
>>>> dependent, that's why I thought of I should create the incomb() function
>>>> inside the loop, or may be I didn't figure out how to relate its arguments
>>>> to the ones inside the loop had I created it outside it. I'm trying to
>>>> create a matrix of all the possible combinations involved in the summations
>>>> and then use the apply() function on each row of that matrix. The problem
>>>> is that the code I wrote works perfectly well for rather small values of
>>>> the sample size,n, and the censoring number, m (for example, n=8,m=4),but
>>>> when n and m are increased (say, n=25,m=15) the code keeps on running for
>>>> days with no output. That's why I thought I should try to avoid explicit
>>>> loops as much as possible, so I did my best in this regard but still the
>>>> code takes too long to execute,(more than three days), thus, i believe
>>>> there must be something wrong.
>>>>
>>>> Here's the full code:
>>>>
>>>> library(pbapply)
>>>> f1 <- function(n, m) {
>>>> stopifnot(n > m)
>>>> r0 <- t(diff(combn(n-1, m-1)) - 1L)
>>>> r1 <- rep(seq(from=0, len=n-m+1), choose( seq(to=m-2, by=-1,
>>>> len=n-m+1), m-2))
>>>> cbind(r0[, ncol(r0):1, drop=FALSE], r1, deparse.level=0)
>>>> }
>>>> simpfun<- function (x,n,m,p,alpha,beta)
>>>> {
>>>> a<-factorial(n-m)/(prod((factorial(x)))*(factorial((n-m)- sum(x))))
>>>> b <- ((m-1):1)
>>>> c<- a*((p)^(sum(x)))*((1-p)^(((m-1)*(n-m))- sum(x%*%(as.matrix(b)))))
>>>> d <- n - cumsum(x) - (1:(m-1))
>>>> e<- n*(prod(d))*c
>>>> LD<-list()
>>>> for (i in 1:(m-1)) {
>>>> LD[[i]]<-seq(0,x[i],1)
>>>> }
>>>> LD[[m]]<-seq(0,(n-m-sum(x)),1)
>>>> LED<-expand.grid (LD)
>>>> LED<-as.matrix(LED)
>>>> store1<-numeric(nrow(LED))
>>>> for (j in 1:length(store1) )
>>>> {
>>>> incomb<-function(x,alpha,beta) {
>>>>
>>>> g<-((-1)^(sum(LED[j,])))*(gamma((1/beta)+1))*((alpha)^(-(1/beta)))
>>>> h <- choose(x, LED[j,-m])
>>>> ik<-prod(h)*choose((n-m-sum(x)),LED[j,m])
>>>> lm<-cumsum(LED[j,-m]) + (1:(m-1))
>>>> plm<-prod(lm)
>>>> gil<-g*ik/(plm)
>>>> hlm<-numeric(sum(LED[j,])+(m-1))
>>>> dsa<-length(hlm)
>>>> for (i in 1:dsa)
>>>> {
>>>> ppp<- sum(LED[j,])+(m-1)
>>>> hlm[i]<-
>>>> (choose(ppp,i))*((-1)^(i))*((i+1)^((-1)*((1/beta)+1)))
>>>> }
>>>> shl<-gil*(sum(hlm)+1)
>>>> return (shl)
>>>> }
>>>> store1[j]<-incomb(x,alpha=0.2,beta=2)
>>>> }
>>>> val1<- sum(store1)*e
>>>> return(val1)
>>>> }
>>>>
>>>> va<-pbapply(s,1,simpfun,n=6,m=4,p=0.3,alpha=0.2,beta=2)
>>>> EXP<-sum(va)
>>>>
>>>>
>>>>
>>>> Any help would be greatly appreciated.
>>>> Thanks a lot for your time.
>>>>
>>>> Best Regards,
>>>> Maram Salem
>>>>
>>>>
>>>> On 2 November 2015 at 00:27, jim holtman <jholtman at gmail.com> wrote:
>>>>
>>>>> Why are you recreating the incomb function within the loop instead of
>>>>> defining it outside the loop? Also you are referencing several variables
>>>>> that are global (e.g., m & j); you should be passing these in as parameters
>>>>> to the function.
>>>>>
>>>>>
>>>>> Jim Holtman
>>>>> Data Munger Guru
>>>>>
>>>>> What is the problem that you are trying to solve?
>>>>> Tell me what you want to do, not how you want to do it.
>>>>>
>>>>> On Sun, Nov 1, 2015 at 7:31 AM, Maram SAlem <marammagdysalem at gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi All,
>>>>>>
>>>>>> I'm writing a long code that takes long time to execute. So I used the
>>>>>> Rprof() function and found out that the function that takes about 80%
>>>>>> of
>>>>>> the time is the incomb () fucntion (below), and this is most probably
>>>>>> because of the many explicit for() loops I'm using.
>>>>>>
>>>>>> n=18;m=4;p=0.3;alpha=0.2;beta=2
>>>>>> x=c(3,0,0)
>>>>>> LD<-list()
>>>>>> for (i in 1:(m-1)) {
>>>>>> LD[[i]]<-seq(0,x[i],1)
>>>>>> }
>>>>>> LD[[m]]<-seq(0,(n-m-sum(x)),1)
>>>>>> LED<-expand.grid (LD)
>>>>>> LED<-as.matrix(LED)
>>>>>> store1<-numeric(nrow(LED))
>>>>>> h<- numeric(m-1)
>>>>>> lm<- numeric(m-1)
>>>>>> for (j in 1:length(store1) )
>>>>>> {
>>>>>> incomb<-function(x,alpha,beta) {
>>>>>>
>>>>>> g<-((-1)^(sum(LED[j,])))*(gamma((1/beta)+1))*((alpha)^(-(1/beta)))
>>>>>> for (i in 1:(m-1)) {
>>>>>> h[i]<- choose(x[i],LED[j,i])
>>>>>> }
>>>>>> ik<-prod(h)*choose((n-m-sum(x)),LED[j,m])
>>>>>> for (i in 1:(m-1)) {
>>>>>> lm[i]<-(sum(LED[j,1:i])) + i
>>>>>> }
>>>>>> plm<-prod(lm)
>>>>>> gil<-g*ik/(plm)
>>>>>> hlm<-numeric(sum(LED[j,])+(m-1))
>>>>>> dsa<-length(hlm)
>>>>>> for (i in 1:dsa)
>>>>>> {
>>>>>> ppp<- sum(LED[j,])+(m-1)
>>>>>> hlm[i]<-
>>>>>> (choose(ppp,i))*((-1)^(i))*((i+1)^((-1)*((1/beta)+1)))
>>>>>> }
>>>>>> shl<-gil*(sum(hlm)+1)
>>>>>> return (shl)
>>>>>> }
>>>>>> store1[j]<-incomb(x,alpha=0.2,beta=2)
>>>>>> }
>>>>>>
>>>>>>
>>>>>> I'm trying to use alternatives (for ex. to vectorize things) to the
>>>>>> explicit for() loops, but things don't work out.
>>>>>>
>>>>>> Any suggestions that can help me to speed up the execution of the
>>>>>> incomb()
>>>>>> function are much appreciated.
>>>>>>
>>>>>> Thanks a lot in advance.
>>>>>>
>>>>>> Maram Salem
>>>>>>
>>>>>> [[alternative HTML version deleted]]
>>>>>>
>>>>>> ______________________________________________
>>>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>> PLEASE do read the posting guide
>>>>>> http://www.R-project.org/posting-guide.html
>>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list