[R-sig-hpc] Snow Not Distributing

Stephen Weston stephen.b.weston at gmail.com
Sun Jan 22 20:41:18 CET 2012


Have you verified that Torque is allocating multiple nodes for your job?
If so, are you using some sort of mpirun command to execute your
R script?  Are you using one of the mpirun --hostfile or --machinefile
options to tell mpirun what nodes to execute on, or are you depending
on MPI/Torque integration to get the allocated hosts?  Open MPI must
be configured with the --with-tm option for Torque integration, for example.

- Steve


On Fri, Jan 20, 2012 at 4:53 PM, Jeff Allen <lists at jdadesign.net> wrote:
> I have been able to successfully setup snow (0.3-5) and Rmpi (0.5-9) on my
> RedHat 5 cluster, and have it working perfectly for jobs that don't span
> multiple nodes.
>
> We're using Torque for resource management, so I start a job with access to
> multiple nodes and load Snow. Unfortunately, not matter what size cluster I
> try to make, all of the workers end up running on the same host -- leaving
> the other hosts idle.
>
> I'm no expert with MPI or snow, so I'm really not sure how to approach
> debugging this.
>
> Any input would be much appreciated!
>
> Jeff
>
> _______________________________________________
> R-sig-hpc mailing list
> R-sig-hpc at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-hpc



More information about the R-sig-hpc mailing list