[R-sig-hpc] Parallel version of the code below - Dual core or more

Thu Dec 10 15:26:07 CET 2009

I was trying to say that it is possible to parallelize iterative solutions
in some cases.  For instance, if you're starting from a solution that is
based on randomness, and iteratively improving it, you could
parallelize it in this fashion:

    library(doSNOW)

    # Create, initialize, and register a snow socket cluster
    cl <- makeSOCKcluster(2)
    clusterSetupRNG(cl)
    registerDoSNOW(cl)

    # Define a simple search function
    lookForBestSolution <- function(itermax) {
      # Pick a starting solution based on randomness
      cursol <- pickRandomSolution()

      # Repeatedly try to improve the solution
      for (i in seq(length=itermax)) {
        x <- tryToImproveSolution(cursol)
        cursol <- bestSolution(x, cursol)
      }

      # Return the best solution found
      cursol
    }

    # Define some very silly example functions for purposes of illustration
    pickRandomSolution <- function() rnorm(1)
    tryToImproveSolution <- function(x) x - rnorm(1)
    bestSolution <- function(...) min(...)

    # Define the total number of times you're willing to improve your solution
    itermax <- 500

    # Create an iterator that will return "numWorkers" values that sum
to itermax
    it <- idiv(itermax, chunks=getDoParWorkers())

    # Pick the best solution that the cluster workers find
    bestsol <- foreach(i=it, .combine='bestSolution') %dopar% {
      lookForBestSolution(i)
    }
    print(bestsol)

    # Shutdown the socket cluster
    stopCluster(cl)

Note that the call to clusterSetupRNG is essential for programs
of this type.  Otherwise, you could have each worker compute
exactly the same result, and you don't gain anything from parallelism.

- Steve

On Wed, Dec 9, 2009 at 10:39 PM, Debabrata Midya
<Debabrata.Midya at services.nsw.gov.au> wrote:
> Steve, Martin, Tim,
>
> Thanks to all of you to give some time and ideas on it.
>
> Your are correct Steve, "The "real" problem is bigger" at least in number of
> dimensions (in my example code d = 3).
>
> The code I have provided is an example one to start with. Although, the
> concept to the real problem will not differ much compared to the example
> code, I have described the whole problem below in A and B.
>
> How can I make a parallel version of it?
>
> A. I have described a variable called cursol in my example code. In my final
> code, I like to have cursol as a matrix of order m x (d+1). Where m = number
> of possible solutions (as for example m= 100). d is the number of variables.
> In my example code d= 3 (trainX= c(x[1], x[2] and x[3])). The d+1 -th
> variable is the value of the func at trainX.
>
> At the end of j = 1 (iteration number 1), the best possible solution
> (minimum value) of the problem is: cursol[1, ]; the second best solution is
> cursol[2, ] and so on. [Note: I have ordered the matrix cursol as per
> cursol[, d+1] from lowest value to highest]
>
> B. At the start of j = 2 (the iteration number 2), trainX = cursol[, 1:d].
> And the iteration goes like A.
>
> That means the whole problem is:
>
> func <- function() {
>
>    # Some calculations
>
>    value # Value of the function at some trainX
> }
>
> main <- function(args) {
>
>      # Some initializations if needed
>
>      # Some calculations if needed
>
>     # trainX: Some value / calculated value
>
>      for i in 1:itermax)
>        {
>
>            # All calculations including the call to the external function
> func
>
>            cursol <- ----
>
>            trainX <- cursol + runif() # Using random number
>        }
> }
>
> solution <- main(args)
>
> Once again, thank you very much for the time you have given.
> Regards,
> Deb
>
>>>> Stephen Weston <stephen.b.weston at gmail.com> 10/12/2009 1:13 pm >>>
> I was thinking that you'd have one task per worker, and that each
> task would take a smaller number of iterations and return their
> best result.  The final result would be the best result of all.
> Of course, you'd probably have to do more total iterations in order
> to get as good a result as in the sequential version, but I think
> you'd still get some benefit from running in parallel.
>
> I haven't looked over the code too carefully, so maybe I'm
> misreading it.  I was also assuming that the "real" problem
> was bigger, since this version looks like it would run very quickly
> sequentially, and so wouldn't be worth doing in parallel.
>
> Also, checkval should be vectorized using pmax and pmin.
>
> - Steve
>
>
> On Wed, Dec 9, 2009 at 8:22 PM, Martin Morgan <mtmorgan at fhcrc.org> wrote:
>> Hi Deb, Steve,
>>
>> Stephen Weston wrote:
>>> Using doSNOW with a socket cluster is probably your best bet.
>>>
>>> - Steve
>>>
>>>
>>> On Wed, Dec 9, 2009 at 6:52 PM, Debabrata Midya
>>> <Debabrata.Midya at services.nsw.gov.au> wrote:
>>>> Tim,
>>>>
>>>> Thanks for your reply.
>>>>
>>>> One clarification please.
>>>>
>>>> I am using R 2.10.0 on Windows XP, Intel(R) Core(TM) 2 Duo CPU E8400
>>>> @3.00GHz 2.99GHz, 1.96 of RAM.
>>>> I have checked that Windows binary of doMC is not available. Any advice.
>>>>
>>>> Once again, thank you very much for the time you have given.
>>>>
>>>> Regards,
>>>>
>>>> Deb
>>>>
>>>>>>> Tim Triche <tim.triche at gmail.com> 10/12/2009 3:54 am >>>
>>>> library(foreach)
>>>> library(doMC) # or doWhatever, take your pick
>>>> registerDoMC(4)
>>>>
>>>> results <- foreach( j=1:itermax ) %dopar% {
>>>>
>>>> # stuff
>>>> if( converged ) break
>>>>
>>>> }
>>
>> Unfortunately, the iterations are not independent; it's instructive to
>> try to get something sensible out of
>>
>>  res0 <- 0
>>  tmp <- foreach (j = 1:10) %dopar% { res0 <- res0 + j }
>>  res0
>>
>> versus what you might expect
>>
>>  res1 <- 0
>>  for (j in 1:10) { res1 <- res1 + j }
>>  res1
>>
>> and to understand how Deb's problem maps to this case.
>>
>> Martin
>>
>>
>>>>
>>>> The details are up to you. I don't remember how best to break out of a
>>>> foreach loop, try help('foreach').
>>>> Have fun. It's pretty straightforward once you get used to foreach()
>>>> behavior, less so up to that point.
>>>> But it's a really useful tool to have in your arsenal for the problem
>>>> you describe. Try to loop over the biggest chunks of processing you can get
>>>> away with, as fork()ing processes has some consequences.
>>>>
>>>>
>>>>
>>>> On Tue, Dec 8, 2009 at 8:02 PM, Debabrata Midya
>>>> <Debabrata.Midya at services.nsw.gov.au> wrote:
>>>>
>>>>
>>>>
>>>> Dear hpc users,
>>>>
>>>> Thanks in advance.
>>>>
>>>> I am using R 2.10.0 on Windows XP, Intel(R) Core(TM) 2 Duo CPU E8400
>>>> @3.00GHz 2.99GHz, 1.96 of RAM.
>>>>
>>>> I like to have a parallel version of the code below. This is an example
>>>> code only.
>>>>
>>>> A: cursol contains values of x[1], x[2], x[3] and minimum value from
>>>> func using x's.
>>>>
>>>> ###########
>>>> # Code starts
>>>> ###########
>>>>
>>>> func <- function(x)
>>>> {
>>>> 10 * x[1] - 11 * x[2] + 12 * x[3]
>>>> }
>>>>
>>>> main <- function(xl, xu)
>>>> {
>>>> checkval <- function(x, xl, xu)
>>>> {
>>>> i <- 1
>>>> while (i <= 3)
>>>> {
>>>> if (x[i] < xl[i]) x[i] <- xl[i]
>>>> if (x[i] > xu[i]) x[i] <- xu[i]
>>>> i <- i + 1
>>>> }
>>>> x
>>>> }
>>>>
>>>> itermax <- 50
>>>>
>>>> j <- 1
>>>> while (j <= itermax)
>>>> {
>>>> # trainX is a matrix of 3 x 3
>>>> trainX <- matrix(rep(xl, 3), 3, 3, byrow=TRUE) + matrix(runif(9, 0, 1),
>>>> 3, 3)
>>>> trainX
>>>> trainX <- t(apply(trainX, 1, checkval, xl, xu))
>>>> trainX
>>>> sold <- apply(trainX, 1, func)
>>>> osold <- cbind(1:3, sold)
>>>> osold <- osold[order(osold[, 2]), ]
>>>> osold
>>>>
>>>> if (j > 1)
>>>> {
>>>> if (osold[1, 2] < cursol[1, 4])
>>>> {
>>>> cursol <- matrix(c(trainX[osold[1,1], ], osold[1,2]), 1, 4)
>>>> }
>>>> else
>>>> {
>>>> cursol <- cursol
>>>> }
>>>> }
>>>>
>>>> if (j <= 1)
>>>> {
>>>> cursol <- matrix(c(trainX[osold[1,1], ], osold[1,2]), 1, 4)
>>>> }
>>>>
>>>> cursol
>>>> j <- j + 1
>>>> }
>>>>
>>>> cursol
>>>> }
>>>>
>>>> xl <- c(1, 2, 3)
>>>> xu <- c(2, 3, 4)
>>>>
>>>> cursol <- main(xl, xu)
>>>> cursol
>>>>
>>>>
>>>> ###########
>>>> # Code ends
>>>> ###########
>>>>
>>>> Once again, thank you very much for the time you have given.
>>>>
>>>> I am looking forward for your reply.
>>>>
>>>> Regards,
>>>>
>>>> Deb
>>>>
>>>>
>>>> [[alternative HTML version deleted]]
>>>>
>>>> _______________________________________________
>>>> R-sig-hpc mailing list
>>>> R-sig-hpc at r-project.org
>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Absolute certainty is the privilege of mathematicians and madmen.
>>>> It is, for scientists, an unobtainable ideal.
>>>> --Cassius J. Keyser
>>>>
>>>>        [[alternative HTML version deleted]]
>>>>
>>>> _______________________________________________
>>>> R-sig-hpc mailing list
>>>> R-sig-hpc at r-project.org
>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
>>>>
>>>
>>> _______________________________________________
>>> R-sig-hpc mailing list
>>> R-sig-hpc at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
>>
>>
>> --
>> Martin Morgan
>> Computational Biology / Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N.
>> PO Box 19024 Seattle, WA 98109
>>
>> Location: Arnold Building M1 B861
>> Phone: (206) 667-2793
>>
>