[R] Appropriateness of R functions for multicore
Patrick Connolly
p_connolly at slingshot.co.nz
Mon Aug 19 22:08:06 CEST 2013
On Sat, 17-Aug-2013 at 05:09PM -0700, Jeff Newmiller wrote:
|> In most threaded multitasking environments it is not safe to
|> perform IO in multiple threads. In general you will have difficulty
|> performing IO in parallel processing so it is best to let the
|> master hand out data to worker tasks and gather results from them
|> for storage. Keep in mind that just because you have eight cores
|> for processing doesn't mean you have eight hard disks, so if your
|> problem is IO bound in single processor operation then it will also
|> be IO bound in threaded operation.
For tasks which don't involve I/O but fail with mclapply, how does one
work out where the problem is? The handy browser() function which
allows for interactive diagnosis won't work with parallel jobs.
What other approaches can one use?
Thanx
---------------------------------------------------------------------------
|> Jeff Newmiller The ..... ..... Go Live...
|> DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go...
|> Live: OO#.. Dead: OO#.. Playing
|> Research Engineer (Solar/Batteries O.O#. #.O#. with
|> /Software/Embedded Controllers) .OO#. .OO#. rocks...1k
|> ---------------------------------------------------------------------------
|> Sent from my phone. Please excuse my brevity.
|>
|> "Hopkins, Bill" <Bill.Hopkins at Level3.com> wrote:
|> >Has there been any systematic evaluation of which core R functions are
|> >safe for use with multicore? Of current interest, I have tried calling
|> >read.table() via mclapply() to more quickly read in hundreds of raw
|> >data files (I have a 24 core system with 72 GB running Ubuntu, a
|> >perfect platform for multicore). There was a 40% failure rate, which
|> >doesn't occur when I invoke read.table() serially from within a single
|> >thread. Another example was using pvec() to invoke
|> >sapply(strsplit(),...) on a huge character vector (to pull out fields
|> >from within a field). It looked like a perfect application for pvec(),
|> >but it fails when serial execution works.
|> >
|> >I thought I'd ask before taking on the task of digging into the
|> >underlying code to see what is might be causing failure in a multicore
|> >(well, multi-threaded) context.
|> >
|> >As an alternative, I could define multiple cluster nodes locally, but
|> >that shifts the tradeoff a bit in whether parallel execution is
|> >advantageous - the overhead is significantly more, and even with 72 GB,
|> >it does impose greater limits on how many cores can be used.
|> >
|> >Bill Hopkins
|> >
|> >______________________________________________
|> >R-help at r-project.org mailing list
|> >https://stat.ethz.ch/mailman/listinfo/r-help
|> >PLEASE do read the posting guide
|> >http://www.R-project.org/posting-guide.html
|> >and provide commented, minimal, self-contained, reproducible code.
|>
|> ______________________________________________
|> R-help at r-project.org mailing list
|> https://stat.ethz.ch/mailman/listinfo/r-help
|> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
|> and provide commented, minimal, self-contained, reproducible code.
--
~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.
___ Patrick Connolly
{~._.~} Great minds discuss ideas
_( Y )_ Average minds discuss events
(:_~*~_:) Small minds discuss people
(_)-(_) ..... Eleanor Roosevelt
~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.
More information about the R-help
mailing list