[R] Appropriateness of R functions for multicore

Prof Brian Ripley ripley at stats.ox.ac.uk
Tue Aug 20 08:33:11 CEST 2013


On 20/08/2013 03:13, Hopkins, Bill wrote:
> I wrap functions to run via multicore with tryCatch() to gather stats on failure rate and capture state.
>
>
>
> I'm still interested in how/whether core fuctions were verified as being threadsafe.

What does 'threads' have to do with this?  Multicore forks processes, 
not spawns threads.  And the manuals contain advice about what is known 
not to be safe usage in that case -- it is not the core functions but 
what you do with them that matters.

As for threads: see the manuals e.g. 
http://cran.r-project.org/doc/manuals/r-release/R-exts.html#OpenMP-support .

>
>
>
> Bill Hopkins
>
>
>
> Written using a virtual Android keyboard...
>
> ------ Original message ------
> From: Jeff Newmiller
> Date: 8/19/2013 5:18 PM
> To: Patrick Connolly;
> Cc: Hopkins, Bill;r-help at R-project.org;
> Subject:Re: [R] Appropriateness of R functions for multicore
>
> I don't know... I suppose it depends how it fails. I recommend that you restrict yourself to using only the data that was passed as parameters to your parallel function. You may be able to tackle parts of the task and return only those partial results to confirm how far through the code you can get.
> ---------------------------------------------------------------------------
> Jeff Newmiller                        The     .....       .....  Go Live...
> DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
>                                        Live:   OO#.. Dead: OO#..  Playing
> Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
> /Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
> ---------------------------------------------------------------------------
> Sent from my phone. Please excuse my brevity.
>
> Patrick Connolly <p_connolly at slingshot.co.nz> wrote:
>> On Sat, 17-Aug-2013 at 05:09PM -0700, Jeff Newmiller wrote:
>>
>>
>> |> In most threaded multitasking environments it is not safe to
>> |> perform IO in multiple threads. In general you will have difficulty
>> |> performing IO in parallel processing so it is best to let the
>> |> master hand out data to worker tasks and gather results from them
>> |> for storage. Keep in mind that just because you have eight cores
>> |> for processing doesn't mean you have eight hard disks, so if your
>> |> problem is IO bound in single processor operation then it will also
>> |> be IO bound in threaded operation.
>>
>> For tasks which don't involve I/O but fail with mclapply, how does one
>> work out where the problem is?  The handy browser() function which
>> allows for interactive diagnosis won't work with parallel jobs.
>>
>> What other approaches can one use?
>>
>> Thanx
>>
>>
>>
>>
>> ---------------------------------------------------------------------------
>>
>>
>>
>> |> Jeff Newmiller                        The     .....       .....  Go
>> Live...
>> |> DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.
>> Live Go...
>> |>                                       Live:   OO#.. Dead: OO#..
>> Playing
>> |> Research Engineer (Solar/Batteries            O.O#.       #.O#.
>> with
>> |> /Software/Embedded Controllers)               .OO#.       .OO#.
>> rocks...1k
>> |>
>> ---------------------------------------------------------------------------
>>
>> |> Sent from my phone. Please excuse my brevity.
>> |>
>> |> "Hopkins, Bill" <Bill.Hopkins at Level3.com> wrote:
>> |> >Has there been any systematic evaluation of which core R functions
>> are
>> |> >safe for use with multicore? Of current interest, I have tried
>> calling
>> |> >read.table() via mclapply() to more quickly read in hundreds of raw
>> |> >data files (I have a 24 core system with 72 GB running Ubuntu, a
>> |> >perfect platform for multicore). There was a 40% failure rate,
>> which
>> |> >doesn't occur when I invoke read.table() serially from within a
>> single
>> |> >thread. Another example was using pvec() to invoke
>> |> >sapply(strsplit(),...) on a huge character vector (to pull out
>> fields
>> |> >from within a field). It looked like a perfect application for
>> pvec(),
>> |> >but it fails when serial execution works.
>> |> >
>> |> >I thought I'd ask before taking on the task of digging into the
>> |> >underlying code to see what is might be causing failure in a
>> multicore
>> |> >(well, multi-threaded) context.
>> |> >
>> |> >As an alternative, I could define multiple cluster nodes locally,
>> but
>> |> >that shifts the tradeoff a bit in whether parallel execution is
>> |> >advantageous - the overhead is significantly more, and even with 72
>> GB,
>> |> >it does impose greater limits on how many cores can be used.
>> |> >
>> |> >Bill Hopkins
>> |> >
>> |> >______________________________________________
>> |> >R-help at r-project.org mailing list
>> |> >https://stat.ethz.ch/mailman/listinfo/r-help
>> |> >PLEASE do read the posting guide
>> |> >http://www.R-project.org/posting-guide.html
>> |> >and provide commented, minimal, self-contained, reproducible code.
>> |>
>> |> ______________________________________________
>> |> R-help at r-project.org mailing list
>> |> https://stat.ethz.ch/mailman/listinfo/r-help
>> |> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> |> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-help mailing list