[R-sig-hpc] Snow Not Distributing
stephen.b.weston at gmail.com
Sun Jan 22 20:41:18 CET 2012
Have you verified that Torque is allocating multiple nodes for your job?
If so, are you using some sort of mpirun command to execute your
R script? Are you using one of the mpirun --hostfile or --machinefile
options to tell mpirun what nodes to execute on, or are you depending
on MPI/Torque integration to get the allocated hosts? Open MPI must
be configured with the --with-tm option for Torque integration, for example.
On Fri, Jan 20, 2012 at 4:53 PM, Jeff Allen <lists at jdadesign.net> wrote:
> I have been able to successfully setup snow (0.3-5) and Rmpi (0.5-9) on my
> RedHat 5 cluster, and have it working perfectly for jobs that don't span
> multiple nodes.
> We're using Torque for resource management, so I start a job with access to
> multiple nodes and load Snow. Unfortunately, not matter what size cluster I
> try to make, all of the workers end up running on the same host -- leaving
> the other hosts idle.
> I'm no expert with MPI or snow, so I'm really not sure how to approach
> debugging this.
> Any input would be much appreciated!
> R-sig-hpc mailing list
> R-sig-hpc at r-project.org
More information about the R-sig-hpc