[R-sig-hpc] Using snow on a looping structure

Martin Morgan mtmorgan at fhcrc.org
Fri Dec 5 18:44:37 CET 2008


"Gang Chen" <gangchen6 at gmail.com> writes:

> Hi Martin,
>
> Your suggestion really helps! It's exactly what I wanted. I really
> appreciate it...
>
> Regarding the array-munging part, the following will do:
>
> b <- aperm(b, c(2,3,4,1))
>
> I have a couple of related issues now:
>
> (1) When running the following I get two warnings on my Mac OS X
> 10.4.11 (one from each processor, I guess):
>
>> cl <- makeCluster(2, type = "SOCK")	
> WARNING: ignoring environment value of R_HOME
> WARNING: ignoring environment value of R_HOME
>
> Why is this warning? How to correct it?

I guess you have a variable R_HOME specified in your environment, and
it is different from the location where the R workers are running
from. I suspect though that R is easily fooled, e.g., by symbolic
links. You might want to make sure that you're actually starting the
right R (ask, e.g., for the worker sessionInfo()).

> (2) Previously I could follow up the progress of the job by sticking
> the following
>
> print(format(Sys.time(), "%D %H:%M:%OS3"))
>
> inside the outermost for loop (with ii index), but now with parallel
> computing I couldn't find a similar way to trace the progress. Do you
> or anybody know how to do that?

You might modify runAna to write a timestamp to a (worker-specific)
file, and track that. It is not a nice hack. Maybe others have better
ideas?

Martin

> Thanks again,
> Gang
>
>
> On Wed, Dec 3, 2008 at 12:05 PM, Martin Morgan <mtmorgan at fhcrc.org> wrote:
>> Hi Gang --
>>
>> "Gang Chen" <gangchen6 at gmail.com> writes:
>>
>>> I'm a newbie running parallel computing, so, sorry for this simple question.
>>>
>>> My original code without parallel computing is like this:
>>>
>>> runAna <- function(myData, Model, ...) {
>>>       myStat <- wFun(myData, Model, ...)   # myStat: a vector with a
>>> length of nStat
>>>       return(myStat)
>>> }
>>>
>>> rStat <- array(0, dim=c(dimx, dimy, dimz, nStat))
>>> for (i in 1:dimx)
>>> for (j in 1:dimy)
>>> for (k in 1:dimz)
>>>      rStat[i, j, k,] <- runAna(rData[i, j, k,], Model, ...)   # each
>>> analysis is on the 4th dimension, and returns nStat numbers which are
>>> stored in the 4th dimension of rStat
>>
>> I think what you want is along the lines of
>>
>>> a <- array(1:(2*3*4*5), c(2,3,4,5))
>>> b <- apply(a, c(1,2,3), range)
>>
>> and then as you guessed
>>
>>> library(snow)
>>> cl <- makeCluster(nNodes, type="SOCK")
>>> d <- parApply(cl, a, c(1,2,3), range)
>>> identical(b, d)
>> [1] TRUE
>>
>> so for your example, I'd guess
>>
>> runStat <- parApply(cl, rData, c(1,2,3), runAna, Model=Model)
>>
>> This is not quite what you want -- the 'result' dimension is the first
>> rather than last
>>
>>> dim(a)
>> [1] 2 3 4 5
>>> dim(b)
>> [1] 2 2 3 4
>>
>> array-munging is not a speciality of mine, but a simple work-around is
>> to reorder the dimensions of the original array, so you're applying
>> to, and writing in, the slice indexed by the first entry
>>
>>> a <- array(1:(5*2*3*4), c(5,2,3,4))
>>> b <- apply(a, c(2,3,4), range)
>>> dim(a)
>> [1] 5 2 3 4
>>> dim(b)
>> [1] 2 2 3 4
>>> d <- parApply(cl, a, c(2,3,4), range)
>>
>> Hope that helps,
>>
>> Martin
>>
>>> I'm trying to run the above analysis using snow on a machine with two
>>> processors, but could not figure out how to correctly set it up:
>>>
>>> nNodes <- 2
>>> library(snow)
>>> cl <- makeCluster(nNodes, type = "SOCK")
>>>
>>> I thought I would use parApply, but how should I combine the looping
>>> with parApply? Or no looping at all with something like parApply(cl,
>>> rData, c(1,2,3), ...)?
>>>
>>> Thanks in advance,
>>> Gang
>>>
>>> _______________________________________________
>>> R-sig-hpc mailing list
>>> R-sig-hpc at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
>>
>> --
>> Martin Morgan
>> Computational Biology / Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N.
>> PO Box 19024 Seattle, WA 98109
>>
>> Location: Arnold Building M2 B169
>> Phone: (206) 667-2793

-- 
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M2 B169
Phone: (206) 667-2793



More information about the R-sig-hpc mailing list