[R] project parallel help
Jeff Newmiller
jdnewmil at dcn.davis.CA.us
Tue Oct 15 21:05:29 CEST 2013
As parameters. For example, if you have 100 simulations, set up a list of 4 distinct sets of data (1:25, 26:50, etc) and call the single-threaded processing function from parLapply iterated four times. Then each instance of the processing function won't return until it has completed 25 simulations.
---------------------------------------------------------------------------
Jeff Newmiller The ..... ..... Go Live...
DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go...
Live: OO#.. Dead: OO#.. Playing
Research Engineer (Solar/Batteries O.O#. #.O#. with
/Software/Embedded Controllers) .OO#. .OO#. rocks...1k
---------------------------------------------------------------------------
Sent from my phone. Please excuse my brevity.
Jeffrey Flint <jeffrey.flint at gmail.com> wrote:
>How can I copy distinct blocks of data to each process?
>
>On Mon, Oct 14, 2013 at 10:21 PM, Jeff Newmiller
><jdnewmil at dcn.davis.ca.us> wrote:
>> The session info is helpful. To the best of my knowledge there is no
>easy way to share memory between R processes other than forking. You
>can use clusterExport to make "global" copies of large data structures
>in each process and pass index values to your function to reduce copy
>costs at a price of extra data copies in each process that won't be
>used. Or you can copy distinct blocks of data to each process and use
>single threaded processing to loop over the blocks within the workers
>to reduce the number of calls to workers. However I don't claim to be
>an expert with the parallel package, so others may have better advice.
>However, with two cores I don't usually get better than a 30%
>speedup... the best payoff comes with four or more workers working.
>>
>---------------------------------------------------------------------------
>> Jeff Newmiller The ..... ..... Go
>Live...
>> DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live
>Go...
>> Live: OO#.. Dead: OO#..
>Playing
>> Research Engineer (Solar/Batteries O.O#. #.O#. with
>> /Software/Embedded Controllers) .OO#. .OO#.
>rocks...1k
>>
>---------------------------------------------------------------------------
>> Sent from my phone. Please excuse my brevity.
>>
>> Jeffrey Flint <jeffrey.flint at gmail.com> wrote:
>>>Jeff:
>>>
>>>Thank you for your response. Please let me know how I can
>>>"unhandicap" my question. I tried my best to be concise. Maybe this
>>>will help:
>>>
>>>> version
>>> _
>>>platform i386-w64-mingw32
>>>arch i386
>>>os mingw32
>>>system i386, mingw32
>>>status
>>>major 3
>>>minor 0.2
>>>year 2013
>>>month 09
>>>day 25
>>>svn rev 63987
>>>language R
>>>version.string R version 3.0.2 (2013-09-25)
>>>nickname Frisbee Sailing
>>>
>>>
>>>I understand your comment about forking. You are right that forking
>>>is not available on windows.
>>>
>>>What I am curious about is whether or not I can direct the execution
>>>of the parallel package's functions to diminish the overhead. My
>>>guess is that there is overhead in copying the function to be
>executed
>>>at each iteration and there is overhead in copying the data to be
>used
>>>at each iteration. Are there any paradigms in the package parallel
>to
>>>reduce these overheads? For instance, I could use clusterExport to
>>>establish the function to be called. But I don't know if there is a
>>>technique whereby I could point to the data to be used by each CPU so
>>>as to prevent a copy.
>>>
>>>Jeff
>>>
>>>
>>>
>>>On Mon, Oct 14, 2013 at 2:35 PM, Jeff Newmiller
>>><jdnewmil at dcn.davis.ca.us> wrote:
>>>> Your question misses on several points in the Posting Guide so any
>>>answers are handicapped by you.
>>>>
>>>> There is an overhead in using parallel processing, and the value of
>>>two cores is marginal at best. In general parallel by forking is more
>>>efficient than parallel by SNOW, but the former is not available on
>all
>>>operating systems. This is discussed in the vignette for the parallel
>>>package.
>>>>
>>>---------------------------------------------------------------------------
>>>> Jeff Newmiller The ..... ..... Go
>>>Live...
>>>> DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#.
>Live
>>>Go...
>>>> Live: OO#.. Dead: OO#..
>>>Playing
>>>> Research Engineer (Solar/Batteries O.O#. #.O#.
>with
>>>> /Software/Embedded Controllers) .OO#. .OO#.
>>>rocks...1k
>>>>
>>>---------------------------------------------------------------------------
>>>> Sent from my phone. Please excuse my brevity.
>>>>
>>>> Jeffrey Flint <jeffrey.flint at gmail.com> wrote:
>>>>>I'm running package parallel in R-3.0.2.
>>>>>
>>>>>Below are the execution times using system.time for when executing
>>>>>serially versus in parallel (with 2 cores) using parRapply.
>>>>>
>>>>>
>>>>>Serially:
>>>>> user system elapsed
>>>>> 4.67 0.03 4.71
>>>>>
>>>>>
>>>>>
>>>>>Using package parallel:
>>>>> user system elapsed
>>>>> 3.82 0.12 6.50
>>>>>
>>>>>
>>>>>
>>>>>There is evident improvement in the user cpu time, but a big jump
>in
>>>>>the elapsed time.
>>>>>
>>>>>In my code, I am executing a function on a 1000 row matrix 100
>times,
>>>>>with the data different each time of course.
>>>>>
>>>>>The initial call to makeCluster cost 1.25 seconds in elapsed time.
>>>>>I'm not concerned about the makeCluster time since that is a fixed
>>>>>cost. I am concerned about the additional 1.43 seconds in elapsed
>>>>>time (6.50=1.43+1.25).
>>>>>
>>>>>I am wondering if there is a way to structure the code to avoid
>>>>>largely avoid the 1.43 second overhead. For instance, perhaps I
>>>could
>>>>>upload the function to both cores manually in order to avoid the
>>>>>function being uploaded at each of the 100 iterations? Also, I
>am
>>>>>wondering if there is a way to avoid any copying that is occurring
>at
>>>>>each of the 100 iterations?
>>>>>
>>>>>
>>>>>Thank you.
>>>>>
>>>>>Jeff Flint
>>>>>
>>>>>______________________________________________
>>>>>R-help at r-project.org mailing list
>>>>>https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>PLEASE do read the posting guide
>>>>>http://www.R-project.org/posting-guide.html
>>>>>and provide commented, minimal, self-contained, reproducible code.
>>>>
>>
More information about the R-help
mailing list