[BioC] High-performance Bioconductor experiments

A.J. Rossini rossini at blindglobe.net
Thu Dec 11 09:29:47 MET 2003


"Michael Benjamin" <msb1129 at bellsouth.net> writes:

> Progress update (summarized from my forum for such matters at
> http://www.theschedule.net/forum/gforum.cgi?forum=20&do=forum_view):
>
> Briefly, I created a four-node cluster out of Pentium-III boxes and
> Debian Linux/openMosix.  I saw no significant performance boost of
> ReadAffy or expresso using the set of 165 .CEL files from Harvard.  None
> of the processes migrated, as they say in the world of high-performance
> computing.  R.bin runs in one process, and everything it does seems to
> stay in that process.  No real opportunity for parallelization here, at
> least not on openMosix.
>
> I'd like to analyze these chips in a reasonable amount of time, without
> paying Dell $45,000 for 4-Xeon SMP server.
>
> I worry what we'll do with 1,000 .CEL files.  The analytical techniques
> work well, but pretty slow even if your amp "goes to 11."
>
> Any thoughts?

Explicitly parallelize the routine.  OpenMOSIX is nice, but it's still
not a production environment with R.  

That's why Michael Li and I wrote RPVM/RSPRNG as well as worked with
Luke Tierney on SNOW.   The tools are there, but someone has to do the
programming.  That means that you can hire someone with the money you
won't spend on software or hardware, or you can wait.

That being said, the 4-way Xeon server isn't going to help with
parallelization of a single process, and you'd get the same work done
with a remote execution shell (i.e. firing off R BATCH or using
Emacs/ESS-Elsewhere  on other machines.

The data-shareing/locale problem is an interesting one that will need
to be solved.  Not sure how we'll go about that.  See our tech report
for an anecdotal example of how one can naively end up twice as slow
on the parallel system (later pathological examples that I've
constructed show slowness increasing a bit in the number of
processors) due to sending data "over the wire" being machines.

best,
-tony


-- 
rossini at u.washington.edu            http://www.analytics.washington.edu/ 
Biomedical and Health Informatics   University of Washington
Biostatistics, SCHARP/HVTN          Fred Hutchinson Cancer Research Center
UW (Tu/Th/F): 206-616-7630 FAX=206-543-3461 | Voicemail is unreliable
FHCRC  (M/W): 206-667-7025 FAX=206-667-4812 | use Email

CONFIDENTIALITY NOTICE: This e-mail message and any attachme...{{dropped}}



More information about the Bioconductor mailing list