[R-sig-hpc] ff and parallel computing (multiple nodes, shared disk)

Sat Nov 14 11:02:57 CET 2009

On Thu, Nov 12, 2009 at 04:29:34PM -0200, Benilton Carvalho wrote:

> I wrote my own code to use NetCDF, which doesn't perform well when I  
> need random access to the data.

What sort of I/O numbers do you actually see?

You're hitting a single shared disk server with random access IO
requests from multiple nodes?  If so, isn't that probably the problem
right there?  Random access is a disk speed killer.  I wouldn't expect
playing with NetCDF vs. SQLite vs. ff vs. bigmemory to make much
difference.  Things I'd expect might help in that case would be:

- Massively faster shared disk I/O (hardware upgrade).
- Moving I/O to the slave nodes.
- Perhaps running an RDBMS that knows how to better optimize incoming
  client I/O requests.

Or is your situation a bit different than the original poster's, and
your code is I/O limited even with just one node?

-- 
Andrew Piskorski <atp at piskorski.com>
http://www.piskorski.com/