[R] Appropriateness of R functions for multicore

Jeff Newmiller jdnewmil at dcn.davis.CA.us
Sun Aug 18 02:09:37 CEST 2013


In most threaded multitasking environments it is not safe to perform IO in multiple threads. In general you will have difficulty performing IO in parallel processing so it is best to let the master hand out data to worker tasks and gather results from them for storage. Keep in mind that just because you have eight cores for processing doesn't mean you have eight hard disks, so if your problem is IO bound in single processor operation then it will also be IO bound in threaded operation.
---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
--------------------------------------------------------------------------- 
Sent from my phone. Please excuse my brevity.

"Hopkins, Bill" <Bill.Hopkins at Level3.com> wrote:
>Has there been any systematic evaluation of which core R functions are
>safe for use with multicore? Of current interest, I have tried calling
>read.table() via mclapply() to more quickly read in hundreds of raw
>data files (I have a 24 core system with 72 GB running Ubuntu, a
>perfect platform for multicore). There was a 40% failure rate, which
>doesn't occur when I invoke read.table() serially from within a single
>thread. Another example was using pvec() to invoke
>sapply(strsplit(),...) on a huge character vector (to pull out fields
>from within a field). It looked like a perfect application for pvec(),
>but it fails when serial execution works.
>
>I thought I'd ask before taking on the task of digging into the
>underlying code to see what is might be causing failure in a multicore
>(well, multi-threaded) context.
>
>As an alternative, I could define multiple cluster nodes locally, but
>that shifts the tradeoff a bit in whether parallel execution is
>advantageous - the overhead is significantly more, and even with 72 GB,
>it does impose greater limits on how many cores can be used.
>
>Bill Hopkins
>
>______________________________________________
>R-help at r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list