[Rd] Speeding up build-from-source

Adam Seering adam at seering.org
Sat Apr 27 17:34:58 CEST 2013



On 04/27/2013 09:10 AM, Martin Morgan wrote:
> On 04/26/2013 07:50 AM, Adam Seering wrote:
>> Hi,
>>      I've been playing around with the R source code a little; mostly
>> just
>> trying to familiarize myself.  I have access to some computers on a
>> reservation
>> system; so I've been reserving a computer, downloading and compiling
>> R, and
>> going from there.
>>
>>      I'm finding that R takes a long time to build, though.  (Well,
>> ok, maybe 5
>> minutes -- I'm impatient :-) )  Most of that time, it's sitting there
>> byte-compiling some internal package or another, which uses just one
>> CPU core so
>> leaves the system mostly idle.
>>
>>      I'm just curious if anyone has thought about parallelizing that
>> process?
>
> Hi Adam -- parallel builds are supported by adding the '-j' flag when
> you invoke make
>
>    make -j
>
> The packages are being built in parallel, in as much as this is possible
> by their dependency structure. Also, you can configure without byte
> compilation, see ~/src/R-devel/configure --help to make this part of the
> build go more quickly. And after an initial build subsets of R, e.g.,
> just the 'main' source or a single package like 'stats', can be built
> with (assuming R's source, e.g., from svn, is in ~/src/R-devel, and
> you're building R in ~/bin/R-devel) with
>
>    cd ~/bin/R-devel/src/main
>    make -j
>    cd ~/bin/R-devel/src/library/stats
>    make -j
>
> The definitive source for answers to questions like these is
>
>    > RShowDoc("R-admin")
>
> Martin

Hi Martin,
	Thanks for the reply -- but I'm afraid the question you've answered 
isn't the question that I intended to ask.

	Based on your response, I think the answer to my question is likely 
"no."  But let me try rephrasing anyway, just in case:

	I'm certainly quite aware of "-j" as a make argument; if I weren't, the 
bottleneck would not be the byte-compilation, and the build would take 
rather more than 5 minutes :-)  That was the very first thing I tried. 
I don't believe that parallel make is as parallel as it theoretically 
could be.  (In fact, I see almost no parallelism between libraries on my 
system; individual .c files are parallelized nicely but only one library 
at a time.  This mostly matters at the compiling-bytecode step, since 
that's the biggest serial operation per library.)  My question is, has 
anyone thought about what it would take to parallelize the build further?

	I'm not sure that this can be done with just the makefiles.  But the 
following comment makes me at least a little suspicious:

""" src/library/Makefile
## FIXME: do some of this in parallel?
"""

	Surely some of the 'for' loops there could be unwound into proper make 
targets with dependency information?  I'm not sure if the dependency 
information would effectively force a serial compilation anyway, though?...

	Another approach, if the above is hard for some reason:  What I'm 
seeing is that the byte compilation is largely serial; but as you note, 
byte-compilation is optional.  Could the makefiles just defer it?; skip 
it up front and then do all the byte-compilations for all of the 
packages concurrently?  From a very cursory read of the code, it looks 
like the relevant code is in src/library/tools/R/makeLazyLoad.R?; and 
that file doesn't immediately look like it's doing anything that 
fundamentally couldn't be parallelized?  (ie., running multiple R 
processes at once, one per library; at a glance the logic looks nicely 
per-library.)

	A third approach could be to try to parallelize the logic in 
makeLazyLoad.R.  I would expect that to be at best much more difficult, 
though.

	Anyway, there are lots of things that look like they could in theory be 
done here.  And I know just enough at this point to be dangerous; not 
enough to contribute :-)  Hence my asking, has anyone thought about 
this?  If not, I assume the best thing for me to do would be to poke at 
it; try to figure out own my own how this works and what's most 
feasible.  But if anyone has any pointers, that would likely save me a 
bunch of time.  And if this is something that you prefer to keep serial 
for some reason, that would be good to know too, so I don't spend time 
on it.

Thanks,
Adam


>
>
>>
>> Thanks,
>> Adam
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>



More information about the R-devel mailing list