[R] project parallel help
Jeffrey Flint
jeffrey.flint at gmail.com
Tue Oct 15 03:10:42 CEST 2013
Jeff:
Thank you for your response. Please let me know how I can
"unhandicap" my question. I tried my best to be concise. Maybe this
will help:
> version
_
platform i386-w64-mingw32
arch i386
os mingw32
system i386, mingw32
status
major 3
minor 0.2
year 2013
month 09
day 25
svn rev 63987
language R
version.string R version 3.0.2 (2013-09-25)
nickname Frisbee Sailing
I understand your comment about forking. You are right that forking
is not available on windows.
What I am curious about is whether or not I can direct the execution
of the parallel package's functions to diminish the overhead. My
guess is that there is overhead in copying the function to be executed
at each iteration and there is overhead in copying the data to be used
at each iteration. Are there any paradigms in the package parallel to
reduce these overheads? For instance, I could use clusterExport to
establish the function to be called. But I don't know if there is a
technique whereby I could point to the data to be used by each CPU so
as to prevent a copy.
Jeff
On Mon, Oct 14, 2013 at 2:35 PM, Jeff Newmiller
<jdnewmil at dcn.davis.ca.us> wrote:
> Your question misses on several points in the Posting Guide so any answers are handicapped by you.
>
> There is an overhead in using parallel processing, and the value of two cores is marginal at best. In general parallel by forking is more efficient than parallel by SNOW, but the former is not available on all operating systems. This is discussed in the vignette for the parallel package.
> ---------------------------------------------------------------------------
> Jeff Newmiller The ..... ..... Go Live...
> DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go...
> Live: OO#.. Dead: OO#.. Playing
> Research Engineer (Solar/Batteries O.O#. #.O#. with
> /Software/Embedded Controllers) .OO#. .OO#. rocks...1k
> ---------------------------------------------------------------------------
> Sent from my phone. Please excuse my brevity.
>
> Jeffrey Flint <jeffrey.flint at gmail.com> wrote:
>>I'm running package parallel in R-3.0.2.
>>
>>Below are the execution times using system.time for when executing
>>serially versus in parallel (with 2 cores) using parRapply.
>>
>>
>>Serially:
>> user system elapsed
>> 4.67 0.03 4.71
>>
>>
>>
>>Using package parallel:
>> user system elapsed
>> 3.82 0.12 6.50
>>
>>
>>
>>There is evident improvement in the user cpu time, but a big jump in
>>the elapsed time.
>>
>>In my code, I am executing a function on a 1000 row matrix 100 times,
>>with the data different each time of course.
>>
>>The initial call to makeCluster cost 1.25 seconds in elapsed time.
>>I'm not concerned about the makeCluster time since that is a fixed
>>cost. I am concerned about the additional 1.43 seconds in elapsed
>>time (6.50=1.43+1.25).
>>
>>I am wondering if there is a way to structure the code to avoid
>>largely avoid the 1.43 second overhead. For instance, perhaps I could
>>upload the function to both cores manually in order to avoid the
>>function being uploaded at each of the 100 iterations? Also, I am
>>wondering if there is a way to avoid any copying that is occurring at
>>each of the 100 iterations?
>>
>>
>>Thank you.
>>
>>Jeff Flint
>>
>>______________________________________________
>>R-help at r-project.org mailing list
>>https://stat.ethz.ch/mailman/listinfo/r-help
>>PLEASE do read the posting guide
>>http://www.R-project.org/posting-guide.html
>>and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list