[R-pkg-devel] install.R running out of memory

Tomas Kalibera tom@@@k@||ber@ @end|ng |rom gm@||@com
Mon Nov 4 10:15:59 CET 2019


On 11/3/19 1:05 PM, Viktor Gal wrote:
> ah yeah i forgot to mention. it’s the same, i.e. it’s not the byte code compilation that causes this behaviour but the preparation for lazy loading.

R is not optimized for these cases (generated code, source file with 
 >100,000 lines of code), but R has bindings for a large number of 
external libraries - it should be possible to make the bindings several 
orders of magnitude smaller and then they'd likely work well.

Making R work well on files like shogun.R would probably require a large 
amount of non-trivial work on R internals. I would be surprised if it 
were just say a memory leak we could fix and solve the issue quickly, it 
may well be that some data structures and algorithms simply won't scale 
to this extent. If you want to find out, you can debug using the usual R 
means (R profiler ?Rprof, run the script using ?source, perhaps 
disabling source references), but to interpret the results you may have 
to go deep into the implementation of R and in this case of S4.

Preparation for lazy loading starts by sourcing the file - with some 
details you can find out in the source code of installation and the 
documentation. I tried quickly and saw a lot of time spent in S4, which 
is not surprising as the generated file stresses S4 well beyond what is 
normally the case with R. But I would not be surprised if there were 
other bottlenecks to be seen later and even if you managed to prepare 
the package for lazy loading, there would probably be significant 
overheads at runtime. Still you could experiment with modifying the code 
generator to avoid the bottlenecks you identify.

If your primary goal is to create R bindings for an external library, 
I'd recommend having a look at how other packages do it to see what is 
scalable (there should be a way to make the code way smaller, and easily 
written by hand in most cases, even though some interfaces are 
generated, too).

Best
Tomas
> cheers,
> viktor
>
>> On 3 Nov 2019, at 06:53, Uwe Ligges <ligges using statistik.tu-dortmund.de> wrote:
>>
>> What happens if you disable byte code compilation?
>>
>> Best,
>> Uwe Ligges
>>
>> On 02.11.2019 19:37, Viktor Gal wrote:
>>> Hi Dirk,
>>> no worries, thnx for the feedback!
>>> cheers,
>>> viktor
>>>> On 2 Nov 2019, at 13:58, Viktor Gal <wiking using maeth.com> wrote:
>>>>
>>>> Hi Dirk,
>>>>
>>>> so the project is open source, you can reproduce the error yourself (but note it’ll take a long time to actually compile it). steps for reproducing:
>>>> git clone https://github.com/shogun-toolbox/shogun.git
>>>> cd shogun
>>>> git checkout feature/shared_ptr
>>>> mkdir build
>>>> cd build
>>>> cmake -DINTERFACE_R=ON ..
>>>> make
>>>> make install
>>>>
>>>> (it requires tons of dependencies… if you have docker you can docker pull shogun/shogun-dev and run things inside the container)
>>>>
>>>> the make install part runs the R CMD INSTALL so that’ll cause the problem.
>>>>
>>>> but i’ve just uploaded the generated R code that causes the problem here, note the script is 7Mb i.e. 175k LoC, so you better wget/curl it:
>>>> http://maeth.com/shogun.R
>>>>
>>>> cheers,
>>>> viktor
>>>>
>>>>> On 2 Nov 2019, at 13:52, Dirk Eddelbuettel <edd using debian.org> wrote:
>>>>>
>>>>>
>>>>> Hi Viktor,
>>>>>
>>>>> On 2 November 2019 at 13:09, Viktor Gal wrote:
>>>>> | I’m developing an ML library that has R bindings… when installing the library with R CMD INSTALL the R process is running out of memory (50G+ ram) when doing:
>>>>> | ** byte-compile and prepare package for lazy loading
>>>>> |
>>>>> | any ideas how i could debug this part of code, to figure out what is actually happening and why is there a memory leak?
>>>>>
>>>>> Easiest for us to help if we can see code -- so if you have a public repo
>>>>> somewhere please the link.
>>>>>
>>>>> I suspect you have some sort of recursion or circular dependency
>>>>> somewhere. It would be very hard for R to run out of 50gb. But we cannot say
>>>>> more.
>>>>>
>>>>> So maybe triage. In a situation like this when a (supposedly complete)
>>>>> package draft of mine fails "top-down" I often re-validate the toolchain
>>>>> "bottom-up" with a minimal package. If that works, keep adding pieces step by
>>>>> step from the 'not-working large package' to the 'small working' package
>>>>> while continuously ensuring that it still builds.
>>>>>
>>>>> Hope this helps, Dirk
>>>>>
>>>>> -- 
>>>>> http://dirk.eddelbuettel.com | @eddelbuettel | edd using debian.org
>>>> ______________________________________________
>>>> R-package-devel using r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-package-devel
>>> ______________________________________________
>>> R-package-devel using r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-package-devel
> ______________________________________________
> R-package-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-package-devel



More information about the R-package-devel mailing list