[R-sig-hpc] R <--> TM <--> Snow <--> Rmpi <--> OpenMPI cluster cleanup

Mark Mueller mark.mueller at gmail.com
Wed Aug 26 16:30:57 CEST 2009


The deactivateCluster() function in the TM package essentially calls
the stopCluster(getMPICluster()) function in the SNOW package.

Does anyone know if the authors of the SNOW and RMPI packages are part
of this list?

On Tue, Aug 25, 2009 at 11:29 PM, Ross Boylan<ross at biostat.ucsf.edu> wrote:
> On Tue, 2009-08-25 at 20:50 -0500, Mark Mueller wrote:
>> PROBLEM DEFINITION --
>>
>> Host environment:
>>
>> - AMD_64, 4xCPU, quad core
>> - Ubuntu 9.04 64-bit
>> - OpenMPI 1.3.2 (to avoid the problem in v1.3 where OpenMPI tries to connect
>> to the localhost via ssh to run local jobs) - manually downloaded source and
>> compiled
>> - Rmpi 0.5-7
>> - TM 0.4
>> - Snow 0.3-3
>> - R 2.9.0
>>
>> When executing the following command on the host:
>>
>> $ mpirun --hostfile <some file> -np 1 R CMD BATCH <some program>.R
>>
>> the following results, yet the <some program>.R completes successfully:
>>
>> "mpirun has exited due to process rank 0 with PID [some pid] on node
>> [node name here] exiting without calling "finalize". This may have
>> caused other processes in the application to be terminated by signals
>> sent by mpirun (as reported here)."
>>
>> CONFIGURATION STEPS TAKEN --
>>
>> - The hostfile does not create a situation where the system is
>> oversubscribed.  In this case, slots=4 and max-slots=5.
>>
>> - The <some program>.R uses snow::activateCluster() and
>> snow::deactivateCluster() in the appropriate places.  There are no
>> other code elements that control MPI in the <some program>.R file.
> FWIW, I use stopCluster(getMPIcluster()) on Debian Lenny (OpenMPI 1.2)
> and that seems to work.  I have a feeling that might be an rmpi command
> rather than a snow command, even though it's a snow session; maybe I
> should shift to deactivateCluster.  On the other hand, maybe
> deactivateCluster() doesn't quite shut down.
>
> The system I'm using for this is inaccessible right now, and so I can't
> easily check the details.
>
> Ross
>
>



More information about the R-sig-hpc mailing list