[Rd] Speeding up build-from-source
Adam Seering
adam at seering.org
Sat Apr 27 17:34:58 CEST 2013
On 04/27/2013 09:10 AM, Martin Morgan wrote:
> On 04/26/2013 07:50 AM, Adam Seering wrote:
>> Hi,
>> I've been playing around with the R source code a little; mostly
>> just
>> trying to familiarize myself. I have access to some computers on a
>> reservation
>> system; so I've been reserving a computer, downloading and compiling
>> R, and
>> going from there.
>>
>> I'm finding that R takes a long time to build, though. (Well,
>> ok, maybe 5
>> minutes -- I'm impatient :-) ) Most of that time, it's sitting there
>> byte-compiling some internal package or another, which uses just one
>> CPU core so
>> leaves the system mostly idle.
>>
>> I'm just curious if anyone has thought about parallelizing that
>> process?
>
> Hi Adam -- parallel builds are supported by adding the '-j' flag when
> you invoke make
>
> make -j
>
> The packages are being built in parallel, in as much as this is possible
> by their dependency structure. Also, you can configure without byte
> compilation, see ~/src/R-devel/configure --help to make this part of the
> build go more quickly. And after an initial build subsets of R, e.g.,
> just the 'main' source or a single package like 'stats', can be built
> with (assuming R's source, e.g., from svn, is in ~/src/R-devel, and
> you're building R in ~/bin/R-devel) with
>
> cd ~/bin/R-devel/src/main
> make -j
> cd ~/bin/R-devel/src/library/stats
> make -j
>
> The definitive source for answers to questions like these is
>
> > RShowDoc("R-admin")
>
> Martin
Hi Martin,
Thanks for the reply -- but I'm afraid the question you've answered
isn't the question that I intended to ask.
Based on your response, I think the answer to my question is likely
"no." But let me try rephrasing anyway, just in case:
I'm certainly quite aware of "-j" as a make argument; if I weren't, the
bottleneck would not be the byte-compilation, and the build would take
rather more than 5 minutes :-) That was the very first thing I tried.
I don't believe that parallel make is as parallel as it theoretically
could be. (In fact, I see almost no parallelism between libraries on my
system; individual .c files are parallelized nicely but only one library
at a time. This mostly matters at the compiling-bytecode step, since
that's the biggest serial operation per library.) My question is, has
anyone thought about what it would take to parallelize the build further?
I'm not sure that this can be done with just the makefiles. But the
following comment makes me at least a little suspicious:
""" src/library/Makefile
## FIXME: do some of this in parallel?
"""
Surely some of the 'for' loops there could be unwound into proper make
targets with dependency information? I'm not sure if the dependency
information would effectively force a serial compilation anyway, though?...
Another approach, if the above is hard for some reason: What I'm
seeing is that the byte compilation is largely serial; but as you note,
byte-compilation is optional. Could the makefiles just defer it?; skip
it up front and then do all the byte-compilations for all of the
packages concurrently? From a very cursory read of the code, it looks
like the relevant code is in src/library/tools/R/makeLazyLoad.R?; and
that file doesn't immediately look like it's doing anything that
fundamentally couldn't be parallelized? (ie., running multiple R
processes at once, one per library; at a glance the logic looks nicely
per-library.)
A third approach could be to try to parallelize the logic in
makeLazyLoad.R. I would expect that to be at best much more difficult,
though.
Anyway, there are lots of things that look like they could in theory be
done here. And I know just enough at this point to be dangerous; not
enough to contribute :-) Hence my asking, has anyone thought about
this? If not, I assume the best thing for me to do would be to poke at
it; try to figure out own my own how this works and what's most
feasible. But if anyone has any pointers, that would likely save me a
bunch of time. And if this is something that you prefer to keep serial
for some reason, that would be good to know too, so I don't spend time
on it.
Thanks,
Adam
>
>
>>
>> Thanks,
>> Adam
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>
More information about the R-devel
mailing list