[R-sig-hpc] ff and parallel computing (multiple nodes, shared disk)
rdiaz02 at gmail.com
Tue Nov 17 13:01:47 CET 2009
On Mon, Nov 16, 2009 at 1:13 PM, Sean Davis <sdavis2 at mail.nih.gov> wrote:
> On Mon, Nov 16, 2009 at 6:41 AM, Ramon Diaz-Uriarte <rdiaz02 at gmail.com>
>> On Sat, Nov 14, 2009 at 11:02 AM, Andrew Piskorski <atp at piskorski.com>
>> > On Thu, Nov 12, 2009 at 04:29:34PM -0200, Benilton Carvalho wrote:
>> >> I wrote my own code to use NetCDF, which doesn't perform well when I
>> >> need random access to the data.
>> > What sort of I/O numbers do you actually see?
>> > You're hitting a single shared disk server with random access IO
>> > requests from multiple nodes? If so, isn't that probably the problem
>> > right there? Random access is a disk speed killer. I wouldn't expect
>> Yes. That is one part of the problem.
>> > playing with NetCDF vs. SQLite vs. ff vs. bigmemory to make much
>> > difference. Things I'd expect might help in that case would be:
>> > - Massively faster shared disk I/O (hardware upgrade).
>> > - Moving I/O to the slave nodes.
>> For some parallelized computations, I think I do not want to move the
>> I/O to the local slave nodes' storage.
>> First, if it is unpredictable (e.g., from load balancing) which node
>> is going to access what part of the data, I must ensure all nodes can
>> access any portion of the data. We could move all that data to the
>> local disks, before any read is attempted, but this still requires
>> coping a complete data frame (or equivalent) to each local disk. In
>> contrast, with the shared disk approach, we leave on one copy in the
>> shared disk and have each node access just the required column(s). For
>> now I am playing with the first approach. The second requires more
>> code and I do not see how it would be much faster.
>> Second, even if the output of the computations is left on the local
>> drives (non-shared disk), it will eventually need to be moved
>> somewhere else where the master can collect and organize all that
>> output. I can do that as the return object from, say, a parallelized
>> apply, or I can leave it on the shared disk and let the master serve
>> itself as needed. In my particular case, putting together an ffdf
>> object in the master seems to be fast, as the ffdf only references ff
>> files (created by the slaves) that live on the shared disk space. But
>> when putting together the ffdf object those ff files do not actually
>> need to be read in their entirety.
> Hi, Ramon.
> My reply below is strolling off-topic, but it gets to at least part of the
> problem that you describe above.
> You might want to look into a general clustered file system such as lustre,
> gluster, or GFS. I can speak a bit to gluster, as we have used it a bit.
Thanks for the suggestion!!
> Basically, the file systems of several machines (for example, all the local
> storage on the nodes) is consolidated, making a shared file system. Since
> this filesystem is shared over multiple machines, it is much less
> susceptible to being overwhelmed by many data streams reading/writing at
> once. Furthermore, these clustered file systems can usually be made aware
> of the concept of "local" storage being local, so there is a preference
> during parallel storage to have the data go to a node-local storage rather
> than being randomly distributed among all available nodes. Gluster we found
> to have the lowest setup cost (it runs as a FUSE plugin), but it still has
> some bugs; that said, for a high-performance shared cluster scratch space,
> we have had some success. See here for more details:
I am forwarding your email to our sys admins. From looking at the docs
for Gluster, if I understand, FUSE can be run by non-root user, but we
still would need the sys admins to set up the FUSE server and install
And then you mention scratch space, some bugs, and "some success". So
you sound very cautious here. I guess that shared binaries, shared
homes, etc, are not something you'd place in here?
>> > - Perhaps running an RDBMS that knows how to better optimize incoming
>> > client I/O requests.
>> > Or is your situation a bit different than the original poster's, and
>> > your code is I/O limited even with just one node?
>> > --
>> > Andrew Piskorski <atp at piskorski.com>
>> > http://www.piskorski.com/
>> > _______________________________________________
>> > R-sig-hpc mailing list
>> > R-sig-hpc at r-project.org
>> > https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
>> Ramon Diaz-Uriarte
>> Structural Biology and Biocomputing Programme
>> Spanish National Cancer Centre (CNIO)
>> Phone: +34-91-732-8000 ext. 3019
>> R-sig-hpc mailing list
>> R-sig-hpc at r-project.org
Structural Biology and Biocomputing Programme
Spanish National Cancer Centre (CNIO)
Phone: +34-91-732-8000 ext. 3019
More information about the R-sig-hpc