[R-sig-hpc] ff and parallel computing (multiple nodes, shared disk)

Mon Nov 16 12:41:50 CET 2009

On Sat, Nov 14, 2009 at 11:02 AM, Andrew Piskorski <atp at piskorski.com> wrote:
> On Thu, Nov 12, 2009 at 04:29:34PM -0200, Benilton Carvalho wrote:
>
>> I wrote my own code to use NetCDF, which doesn't perform well when I
>> need random access to the data.
>
> What sort of I/O numbers do you actually see?
>
> You're hitting a single shared disk server with random access IO
> requests from multiple nodes?  If so, isn't that probably the problem
> right there?  Random access is a disk speed killer.  I wouldn't expect

Yes. That is one part of the problem.

> playing with NetCDF vs. SQLite vs. ff vs. bigmemory to make much
> difference.  Things I'd expect might help in that case would be:
>
> - Massively faster shared disk I/O (hardware upgrade).
> - Moving I/O to the slave nodes.

For some parallelized computations, I think I do not want to move the
I/O to the local slave nodes' storage.

First, if it is unpredictable (e.g., from load balancing) which node
is going to access what part of the data, I must ensure all nodes can
access any portion of the data. We could move all that data to the
local disks, before any read is attempted, but this still requires
coping a complete data frame (or equivalent) to each local disk. In
contrast, with the shared disk approach, we leave on one copy in the
shared disk and have each node access just the required column(s). For
now I am playing with the first approach. The second requires more
code and I do not see how it would be much faster.

Second, even if the output of the computations is left on the local
drives (non-shared disk), it will eventually need to be moved
somewhere else where the master can collect and organize all that
output. I can do that as the return object from, say, a parallelized
apply, or I can leave it on the shared disk and let the master serve
itself as needed. In my particular case, putting together an ffdf
object in the master seems to be fast, as the ffdf only references ff
files (created by the slaves) that live on the shared disk space. But
when putting together the ffdf object those ff files do not actually
need to be read in their entirety.

Best,

R.

> - Perhaps running an RDBMS that knows how to better optimize incoming
>  client I/O requests.
>
> Or is your situation a bit different than the original poster's, and
> your code is I/O limited even with just one node?
>
> --
> Andrew Piskorski <atp at piskorski.com>
> http://www.piskorski.com/
>
> _______________________________________________
> R-sig-hpc mailing list
> R-sig-hpc at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
>

-- 
Ramon Diaz-Uriarte
Structural Biology and Biocomputing Programme
Spanish National Cancer Centre (CNIO)
http://ligarto.org/rdiaz
Phone: +34-91-732-8000 ext. 3019