[R-sig-hpc] SNOW Hybrid Cluster in R, Network problems
Martin Seilmayer
M.Seilmayer at hzdr.de
Wed Jul 4 12:06:05 CEST 2012
Hi,
I solved the problem of big networkload using parLapply!!
I will explain it in short with a similar example:
cl<-makeCluster(4,type="SOCK")
spectrum.lomb<-function(x,y) { ....} x... space vector e.g. time, or
location
y... measured data
spectrum.lomb calculates a periodogram (frequency analysis) depending on
x and y
we have many data in y with the same space vector on x. So a data list
can be created
dat<-list(all_y) dat[[1]]=y1; dat[[2]]=y2; .... and so on
If youy want to use lapply, one have to ensure that the first argument
in the function FUN is a listelement of 'dat'. So in case of our
function we need to change the order of variables, because x is always
the same!!!
fun2<-function()
{
clusterExport(cl,ls.str(mode="function",envir=.GlobalEnv )) #
export ONLY the functions not the DATA to each node
result<-parLapply(cl,dat,function(a,b) spectrum.lomb(b,a), b<-x)
# change of variables order!!
return(result)
}
fun2() producing much networkload
Now take into account that parallel computing needs to distribute all
relevant data to each node. In this case it should be the data list and
the constant x vector. But this call end up in heavy network load and
malfunction of network on 2 of my windows workers. Of some reason the
indirect call of spectrum.lomb(x,y) causes R to distribute all the
environment to each node. In my case sensor data is to be calculated and
the environment ist about 500MB large. The result is, that n times 500MB
changing the place and acquiring many Memory on the machines.
!!! Now if you try to call 'clusterExport' and 'parLapply' from console,
every thing is fine, and network load is as low as possible. !!!
# changing the variables order in the definition of the function
spectrum.lomb<-function(y,x) { ....} x... space vector e.g. time, or
location
y... measured data
fun3<-function()
{
clusterExport(cl,ls.str(mode="function",envir=.GlobalEnv )) #
export ONLY the functions not the DATA to each node
result<-parLapply(cl,dat, spectrum.lomb, x<-constant_vector)
change of variables order!!
return(result)
}
fun3() causing minimal network load
So in the end I rewrote my definition and the problem is gone. But could
anybody explain that to me?! Is see the point with the environments, but
I don't understand this.
Cu and many thanks!
Martin
Dipl.-Ing. Martin Seilmayer
Helmholtz-Zentrum Dresden-Rossendorf e. V.
Institut fuer Fluiddynamik
Abteilung Magnetohydrodynamik
Bautzner Landstraße 400
01328 Dresden, Germany
@fon: +49 351 260 3165
@fax: +49 351 260 12969
@web: www.hzdr.de
Am 03.07.2012 20:03, schrieb Stephen Weston:
> Hi Martin,
>
> So if mpiexec works on your machines, why aren't you creating
> a snow MPI cluster?
>
> - Steve
>
>
> On Tue, Jul 3, 2012 at 11:09 AM, Martin Seilmayer <M.Seilmayer at hzdr.de> wrote:
>> Hi Steve,
>>
>> :) so you pointed out the most interesting problem in this challenge: "How
>> to start (fast) a remote process on Windows from a console"
>>
>> The aim was to find a clean solution without mixing up Windows with Linux
>> staff like ssh or PsTools or rsh or vis versa. With MPICH2, smpd
>> service/daemon running, it is possible so start processes on each worker.
>> I am working on a Windows machine, so the R command to start a
>> Windows-worker is as follows:
>> system(paste("mpiexec -hosts ",length(nodestostart),"
>> ",paste(nodestostart,collapse=" ")," C:\\MPI_R_Start.bat
>> ",hostname,sep=""),wait=F)
>> "wait = F" important, because the next command should be
>> makeCluster(listofnodes,type=SOCK)
>> So I wrote a batch-file for windows and a shell script for Linux, which sets
>> up each machine individually and which is started remote form MPICH2. This
>> config script starts as many Rscripts as I had configured before. The Master
>> on the other side knows how many workers are available an each machine.
>>
>> Important! It takes no configuration like rshcmd (because ssh is not
>> available on Win). One must guarantee that "makeCluster()" comes after
>> "starting Workers" thats the "trick". This works fine, because makeCluster()
>> makes the master waiting for each worker calling back. If the master is not
>> responding, the worker shuts down, so far in the Linux world.
>>
>> And finally: Yes Mpich2 is able to start every program / command on a remote
>> machine. Thats a bit of a security problem, if you were very strong and
>> restrictive to me.
>>
>> I hope I answered in your sense!
>>
>> Martin
>>
>>
>>
>> Dipl.-Ing. Martin Seilmayer
>>
>> Helmholtz-Zentrum Dresden-Rossendorf e. V.
>>
>> Institut fuer Fluiddynamik
>> Abteilung Magnetohydrodynamik
>> Bautzner Landstraße 400
>> 01328 Dresden, Germany
>>
>> @fon: +49 351 260 3165
>> @fax: +49 351 260 12969
>> @web: www.hzdr.de
>>
>> Am 03.07.2012 16:44, schrieb Stephen Weston:
>>
>>> On Tue, Jul 3, 2012 at 5:31 AM, Martin Seilmayer <M.Seilmayer at hzdr.de>
>>> wrote:
>>>> Hi all of you,
>>>>
>>>> I successfully created a hybrid cluster of several Windows and Linux
>>>> machines using snow and MPICH2. Basically I setup a SOCK - Cluster. To
>>>> start
>>>> the Rscript processes on each machine MPICH2 comes in the game. Because
>>>> it
>>> Are you saying that you're using MPICH2 to start your workers on the
>>> remote machines when creating a SOCK cluster? How are you doing that?
>>> Are you setting the rshcmd option in some way? Does MPICH2 include a
>>> remote execution command that is ssh-like?
>>>
>>> - Steve Weston
>>
>>
More information about the R-sig-hpc
mailing list