[BioC] 'parallel' vs 'multicore'
Martin Morgan
mtmorgan at fhcrc.org
Thu Oct 6 21:37:38 CEST 2011
Hi Tim
On 10/06/2011 12:26 PM, Tim Triche, Jr. wrote:
> Out of curiosity, why would memory be less of an issue with SNOW than
> with mclapply?
meant to leave the other impression -- that mclapply will generally be
better with memory than snow.
> My intuition is that, as soon as the data in a child process' image
> diverges from the parent process', the memory usage will get pretty
> savage either way. At least, that matches what I remember about how
> fork() works and what I see when I run diverging children. They
> misbehave on occasion, as children are wont to do.
In principle and as I understand it fork should be copy-on-change.
Objects shared between processes are not duplicated in memory until
modified, so any data that is effectively read-only is handled better by
multicore. Also, snow will serialize / unserialize objects to send them
to children, and this can be quite slow for large objects; both snow and
multicore rely on serialization for return values, which really
encourages the idea that the return value is significantly reduced -- a
vector of counts of reads overlapping regions of interest, rather than
the reads themselves.
> Anyways -- would it be out of the question for 'parallel' to export a
> dummy function like
>
> mclapply <- lapply
>
> on Windows? Maybe I'll go post that on r-dev so that Prof. Ripley can
> bite my head off :-)
yes that's your best bet!
Martin
> For all the shortcomings of foreach() / doMC() and friends, their
> default (run serially) was/is sensible.
>
>
>
> On Thu, Oct 6, 2011 at 12:09 PM, Martin Morgan <mtmorgan at fhcrc.org
> <mailto:mtmorgan at fhcrc.org>> wrote:
>
> On 10/06/2011 10:21 AM, Tim Triche, Jr. wrote:
>
> I have a lot of methods in methylumi (the revised version) that
> will happily
> parallelize themselves for (e.g.) loading hundreds of IDAT
> files, background
> correcting and normalizing anything in sight, etc. Sometimes
> it's easier to
> parallelize things until I can find time to make them properly
> efficient
> (boooo!).
> When I compiled HEAD for R-2.14 the other day, after installing
> it, I typed
>
> library(parallel)
>
> And all the handy bits of snow and multicore were in there! If
> I switch to
> the 'parallel' package, by default, will I now be OK and not
> screw Windows
> users? Everything works great on Linux/Unix, and has done so for
> months,
> with 'multicore'. It seems like there aren't any substantial
> differences
> other than things "just work" for a base installation -- do
> other package
> authors anticipate moving over now that this is slated to be in
> the stable
> release?
>
>
> Yes you and other developers should switch to parallel; it seems to
> be the wave of the future.
>
> Likely your DESCRIPTION file should have
>
> Imports: parallel
>
> and your NAMESPACE
>
> import(parallel)
>
> Importing all of parallel seems to be the best solution, because the
> available symbols depend on platform, e.g., mclapply on Linux / Mac
> but not Windows.
>
> It's still the case that mclapply, for instance, is not supported on
> Windows so your code needs to have some conditional evaluation --
> exists("mclapply", "package:parallel").
>
> If memory weren't an issue, then the 'sockets' interface from SNOW
> are the most portable.
>
> Martin
> --
> Computational Biology
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
>
> Location: M1-B861
> Telephone: 206 667-2793 <tel:206%20667-2793>
>
>
>
>
> --
> If people do not believe that mathematics is simple,
> it is only because they do not realize how complicated life is.
>
>
> John von Neumann
> <http://www-groups.dcs.st-and.ac.uk/~history/Biographies/Von_Neumann.html>
>
>
--
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
Location: M1-B861
Telephone: 206 667-2793
More information about the Bioconductor
mailing list