We are seeing some odd behavior in the allocation of R tasks to MPI slots
under OpenMPI.  All of the following takes place on a parallel compute
system we built and run on Amazon Web Services.  Hardware is two AWS cluster
compute nodes (cc1.4xlarge), each with 8 cores.  We designate one node as
the "head" node, and the other as the "client" node.

 
Scenario A: We request, via the hostfile, 8 slots on the head node and 8
slots on the client node.  We then spawn an R cluster using RMPI, with 1
master and 15 slaves.  The master (rank 0) and 8 slaves (rank 1-8), i.e. a
total of 9 R tasks, appear on the head node, and only 7 slaves (rank 9-15)
appear on the client node.

 
Scenario B: We request, via the hostfile, 7 slots on the head node and 8
slots on the client node.  We then spawn an R cluster using RMPI, with 1
master and 15 slaves (same as Scenario A).  The master (rank 0) and 7 slaves
(rank 1-7), i.e. a total of 8 R tasks, appear on the head node, and 8 slaves
(rank 8-15) appear on the client node.

 
We have run several parallel jobs under both scenarios, and Scenario A is
always considerably slower (c. 40%) than Scenario B, suggesting that in
Scenario A there really are more slots (9?) than available cores (8) on the
head node.

 
Does anyone have insight into the source of this behavior?  Is it OpenMPI,
RMPI, or some interaction between the two?  Thanks.

 
Jeff Howbert

Brian Pratt

Insilicos LLC

 
	[[alternative HTML version deleted]]