[R-sig-hpc] SNOW Hybrid Cluster in R, Network problems

Martin Seilmayer M.Seilmayer at hzdr.de
Wed Jul 4 12:06:05 CEST 2012


Hi,

I solved the problem of big networkload using parLapply!!

I will explain it in short with a similar example:

cl<-makeCluster(4,type="SOCK")

spectrum.lomb<-function(x,y) { ....}    x... space vector e.g. time, or 
location
y... measured data

spectrum.lomb calculates a periodogram (frequency analysis) depending on 
x and y

we have many data in y with the same space vector on x. So a data list 
can be created

dat<-list(all_y)        dat[[1]]=y1; dat[[2]]=y2; .... and so on

If youy want to use lapply, one have to ensure that the first argument 
in the function FUN is a listelement of 'dat'. So in case of our 
function we need to change the order of variables, because x is always 
the same!!!

fun2<-function()
{
      clusterExport(cl,ls.str(mode="function",envir=.GlobalEnv )) # 
export ONLY the functions not the DATA to each node

     result<-parLapply(cl,dat,function(a,b) spectrum.lomb(b,a), b<-x)    
# change of variables order!!
     return(result)
}
fun2()    producing much networkload

Now take into account that parallel computing needs to distribute all 
relevant data to each node. In this case it should be the data list and 
the constant x vector. But this call end up in heavy network load and 
malfunction of network on 2 of my windows workers. Of some reason the 
indirect call of spectrum.lomb(x,y) causes R to distribute all the 
environment to each node. In my case sensor data is to be calculated and 
the environment ist about 500MB large. The result is, that n times 500MB 
changing the place and acquiring many Memory on the machines.
!!! Now if you try to call 'clusterExport' and 'parLapply' from console, 
every thing is fine, and network load is as low as possible. !!!

# changing the variables order in the definition of the function

spectrum.lomb<-function(y,x) { ....}    x... space vector e.g. time, or 
location
y... measured data
fun3<-function()
{
      clusterExport(cl,ls.str(mode="function",envir=.GlobalEnv )) # 
export ONLY the functions not the DATA to each node

     result<-parLapply(cl,dat, spectrum.lomb, x<-constant_vector)    
change of variables order!!
     return(result)
}
fun3()    causing minimal network load

So in the end I rewrote my definition and the problem is gone. But could 
anybody explain that to me?! Is see the point with the environments, but 
I don't understand this.


Cu and many thanks!

Martin





Dipl.-Ing. Martin Seilmayer

Helmholtz-Zentrum Dresden-Rossendorf e. V.

Institut fuer Fluiddynamik
Abteilung Magnetohydrodynamik
Bautzner Landstraße 400
01328 Dresden, Germany

@fon: +49 351 260  3165
@fax: +49 351 260 12969
@web: www.hzdr.de

Am 03.07.2012 20:03, schrieb Stephen Weston:
> Hi Martin,
>
> So if mpiexec works on your machines, why aren't you creating
> a snow MPI cluster?
>
> - Steve
>
>
> On Tue, Jul 3, 2012 at 11:09 AM, Martin Seilmayer <M.Seilmayer at hzdr.de> wrote:
>> Hi Steve,
>>
>> :) so you pointed out the most interesting problem in this challenge: "How
>> to start (fast) a remote process on Windows from a console"
>>
>> The aim was to find a clean solution without mixing up Windows with Linux
>> staff like ssh or PsTools or rsh or  vis versa. With MPICH2, smpd
>> service/daemon running, it is possible so start processes on each worker.
>> I am working on a Windows machine, so the R command to start a
>> Windows-worker is as follows:
>>      system(paste("mpiexec -hosts ",length(nodestostart),"
>> ",paste(nodestostart,collapse=" ")," C:\\MPI_R_Start.bat
>> ",hostname,sep=""),wait=F)
>> "wait = F" important, because the next command should be
>> makeCluster(listofnodes,type=SOCK)
>> So I wrote a batch-file for windows and a shell script for Linux, which sets
>> up each machine individually and which is started remote form MPICH2. This
>> config script starts as many Rscripts as I had configured before. The Master
>> on the other side knows how many workers are available an each machine.
>>
>> Important! It takes no configuration like rshcmd (because ssh is not
>> available on Win). One must guarantee that "makeCluster()" comes after
>> "starting Workers" thats the "trick". This works fine, because makeCluster()
>> makes the master waiting for each worker calling back. If the master is not
>> responding, the worker shuts down, so far in the Linux world.
>>
>> And finally: Yes Mpich2 is able to start every program / command on a remote
>> machine. Thats a bit of a security problem, if you were very strong and
>> restrictive to me.
>>
>> I hope I answered in your sense!
>>
>> Martin
>>
>>
>>
>> Dipl.-Ing. Martin Seilmayer
>>
>> Helmholtz-Zentrum Dresden-Rossendorf e. V.
>>
>> Institut fuer Fluiddynamik
>> Abteilung Magnetohydrodynamik
>> Bautzner Landstraße 400
>> 01328 Dresden, Germany
>>
>> @fon: +49 351 260 3165
>> @fax: +49 351 260 12969
>> @web: www.hzdr.de
>>
>> Am 03.07.2012 16:44, schrieb Stephen Weston:
>>
>>> On Tue, Jul 3, 2012 at 5:31 AM, Martin Seilmayer <M.Seilmayer at hzdr.de>
>>> wrote:
>>>> Hi all of you,
>>>>
>>>> I successfully created a hybrid cluster of several Windows and Linux
>>>> machines using snow and MPICH2. Basically I setup a SOCK - Cluster. To
>>>> start
>>>> the Rscript processes on each machine MPICH2 comes in the game. Because
>>>> it
>>> Are you saying that you're using MPICH2 to start your workers on the
>>> remote machines when creating a SOCK cluster?  How are you doing that?
>>> Are you setting the rshcmd option in some way?  Does MPICH2 include a
>>> remote execution command that is ssh-like?
>>>
>>> - Steve Weston
>>
>>



More information about the R-sig-hpc mailing list