[R-sig-hpc] ff and parallel computing (multiple nodes, shared disk)
bcarvalh at jhsph.edu
Sat Nov 14 12:57:32 CET 2009
When I tried this, I didn't even put the solution on a cluster. Using
my own laptop, I had an object with 8M rows and took about 1m30s to
retrieve 20 random rows. Saving a transposed version of the matrix
took much more time, but data retrieval was way faster, and I (think
I) got something like 2-3 seconds to retrieve 20 rows from this big
ff and bigmemory take a fraction of a second, but the comparison seems
a bit "unfair" as some sort of caching happens.
On Nov 14, 2009, at 8:02 AM, Andrew Piskorski wrote:
> On Thu, Nov 12, 2009 at 04:29:34PM -0200, Benilton Carvalho wrote:
>> I wrote my own code to use NetCDF, which doesn't perform well when I
>> need random access to the data.
> What sort of I/O numbers do you actually see?
> You're hitting a single shared disk server with random access IO
> requests from multiple nodes? If so, isn't that probably the problem
> right there? Random access is a disk speed killer. I wouldn't expect
> playing with NetCDF vs. SQLite vs. ff vs. bigmemory to make much
> difference. Things I'd expect might help in that case would be:
> - Massively faster shared disk I/O (hardware upgrade).
> - Moving I/O to the slave nodes.
> - Perhaps running an RDBMS that knows how to better optimize incoming
> client I/O requests.
> Or is your situation a bit different than the original poster's, and
> your code is I/O limited even with just one node?
> Andrew Piskorski <atp at piskorski.com>
> R-sig-hpc mailing list
> R-sig-hpc at r-project.org
More information about the R-sig-hpc