[Bioc-devel] build machines

Hervé Pagès hpages at fredhutch.org
Fri Apr 27 21:19:15 CEST 2018


On 04/27/2018 10:50 AM, Martin Morgan wrote:
> For what it's worth, BiocParallel implemented as outlined in it's 
> vignette limits the number of cores via
> 
>      if (nzchar(Sys.getenv("BBS_HOME")))
>          cores <- min(4L, cores)
> 
> i.e., checking an environment variable set on the build system. This is 
> highly fragile and I wouldn't necessarily recommend this outside the 
> BiocParallel context.

One problem with this is that when people troubleshoot they don't
get the same thing than what they see on the build report.

How about detecting that code is being run in the context of
R CMD build or R CMD check instead? Is there an easy/robust
way to do this?

Thanks,
H.

> 
> Martin
> 
> On 04/27/2018 01:39 PM, Ludwig Geistlinger wrote:
>> Hi Hervé,
>>
>>> Some packages are good citizens and limit the number of
>>> cores to 1 or 2 only during 'R CMD check' but some packages
>>> try to use all the cores that are available
>>
>> That seems to be an important note for developers using parallel 
>> computation.
>> What's best practice to realize this within my code, i.e. checking 
>> whether the code is currently subject to R CMD check (and accordingly 
>> reducing the number of cores used)?
>>
>> Thanks,
>> Ludwig
>>
>> -- 
>> Dr. Ludwig Geistlinger
>> CUNY School of Public Health
>>
>> ________________________________________
>> From: Bioc-devel <bioc-devel-bounces at r-project.org> on behalf of 
>> Kasper Daniel Hansen <kasperdanielhansen at gmail.com>
>> Sent: Friday, April 27, 2018 10:29 AM
>> To: Hervé Pagès
>> Cc: bioc-devel at r-project.org
>> Subject: Re: [Bioc-devel] build machines
>>
>> Thanks.
>>
>> I used
>>    /usr/bin/time -v R CMD check ...
>> to record the max memory usage of the check, which for minfi suggests
>> around 5Gb.  That's a lot.
>>
>> Best,
>> Kasper
>>
>> On Thu, Apr 26, 2018 at 3:02 PM, Hervé Pagès <hpages at fredhutch.org> 
>> wrote:
>>
>>> Hi,
>>>
>>> The Linux and Windows builders have 32 GB of RAM, the Mac
>>> builders 64 Gb.
>>>
>>> We also run concurrent R CMD check's.
>>>
>>> Here is a summary:
>>>
>>>    platform           RAM   nb of     nb of concurrent
>>>                       (Gb)  cores        R CMD check's
>>>    ---------------------------------------------------
>>>    Linux (malbecs)     32      20                   10
>>>    Windows (tokays)    32      40                   24
>>>    Mac (meridas)       64      24                   18
>>>
>>> That's a lot of concurrency. And there is actually more
>>> concurrency than that if you consider the fact that many
>>> packages run things in parallel during 'R CMD check'.
>>> Some packages are good citizens and limit the number of
>>> cores to 1 or 2 only during 'R CMD check' but some packages
>>> try to use all the cores that are available. This will have
>>> a strong impact on the overall progress of the builds. We
>>> don't have an easy way to identify those packages right now.
>>>
>>> In average, based on our monitoring of the build machines
>>> things seem to work ok i.e. the concurrent R CMD check's
>>> don't seem to be competing too much to access resources.
>>>
>>> But occasionally there could be too much competition. The
>>> crazy big elapsed time compared to the relatively short user
>>> and system times that you observed Kasper are likely to reflect
>>> that. They could be the sign that the machine ran out of memory
>>> and started swapping. Not because it happens to your package
>>> means that your package uses too much memory. The swapping is
>>> the result of the **cumulated** memory usage of all the
>>> R CMD check's running at that moment. It could be worth checking
>>> how much memory R CMD check'ing your package uses though.
>>>
>>> The exact set of packages that are being R CMD check'ed at any
>>> given time is in constant fluctuation and will also vary from
>>> one day to the other. This would explain why some days you see
>>> timeouts on some platforms and some days not. We don't have
>>> an easy way to know which packages were competing with yours
>>> during the 40 min window that 'R CMD check' was running on your
>>> package until the build system declared a timeout. It's possible
>>> (by looking at the BBS logs) but is time consuming.
>>>
>>> We should probably add some memory at some point to the Windows
>>> builders. 32 Gb is not enough to smoothly run 24 R CMD check's
>>> concurrently.
>>>
>>> H.
>>>
>>>
>>> On 04/26/2018 08:48 AM, Diogo FT Veiga wrote:
>>>
>>>> Hi Daniel,
>>>>
>>>> I have the same issue with my package (new contribution). I just finish
>>>> reviewing the package with the modifications requested.
>>>>
>>>> I am having a warning because R CMD check is exceeding 5 min, but 
>>>> this is
>>>> happening only in the Windows machine.
>>>>
>>>> In Linux and OSX the check finishes in <= 4min, while in Windows takes
>>>> ~6min.
>>>>
>>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__biocondu
>>>> ctor.org_spb-5Freports_maser-5Fbuildreport-5F20180425114748
>>>> .html&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY
>>>> _wJYbW0WYiZvSXAJJKaaPhzWA&m=JwiMI-3BEUJlonlihLD_mDkPuEIalQbk
>>>> rQPSGahzfsg&s=1aMitB3PnVLoojx1lnj_UT_ZeKlJ_OcJDFT4D6BPXow&e=
>>>>
>>>>
>>>> Not sure how to proceed from here.
>>>>
>>>> Thanks,
>>>> Diogo
>>>>
>>>>
>>>> On Thu, Apr 26, 2018 at 9:52 AM, Kasper Daniel Hansen <
>>>> kasperdanielhansen at gmail.com> wrote:
>>>>
>>>> We have been working on the minfi package lately, with a move to a
>>>>> DelayedArray backend.
>>>>>
>>>>> Right now there are some weird issues regarding timings in R CMD 
>>>>> check.
>>>>> Leaving aside the issue that the tests (now disabled) and examples are
>>>>> too
>>>>> slow, we get some very weird behaviour.
>>>>>
>>>>> An example is the current (soon to be replace) build report of minfi
>>>>> 1.25.2
>>>>> which prints
>>>>>
>>>>> Examples with CPU or elapsed time > 5s
>>>>>                         user system  elapsed
>>>>> preprocessFunnorm   99.388  0.632  148.554
>>>>> combineArrays       64.104  2.120   68.329
>>>>> bumphunter          62.540  1.392   64.107
>>>>> preprocessNoob      43.944  0.016   44.955
>>>>> preprocessQuantile  33.968  0.064   36.547
>>>>> getAnnotation       31.072  0.024   31.126
>>>>> compartments        18.668  0.188   18.871
>>>>> minfiQC             10.124  6.628 1102.929
>>>>> getSex              10.536  0.012   10.561
>>>>> read.metharray       7.504  2.116   82.713
>>>>> read.metharray.exp   9.076  0.032   10.592
>>>>> mapToGenome-methods  4.648  0.548  163.648
>>>>> mdsPlot              0.340  0.204   14.901
>>>>>
>>>>>
>>>>> on Tokay (Linux).  Note minfiQC which has an elapsed time which is 
>>>>> crazy
>>>>> high compared to user+system.  Previous build report (which I didn't
>>>>> save)
>>>>> had a timeout on all platforms with a semingly similar behaviour 
>>>>> but with
>>>>> the getSex function.  The code did not change in the meantime.  For
>>>>> today's
>>>>> build we only see this on Linux, but yesterday all platforms were
>>>>> affected.
>>>>>
>>>>> This is likely to be very hard to debug.  But I am thinking memory
>>>>> issues:
>>>>> this example requires loading an annotation package and a data 
>>>>> package,
>>>>> both of which are "big".  How much RAM does the machines have and are
>>>>> multiple R CMD check's run concurrently?
>>>>>
>>>>> Best,
>>>>> Kasper
>>>>>
>>>>>           [[alternative HTML version deleted]]
>>>>>
>>>>> _______________________________________________
>>>>> Bioc-devel at r-project.org mailing list
>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.et
>>>>> hz.ch_mailman_listinfo_bioc-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt
>>>>> 84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=Jw
>>>>> iMI-3BEUJlonlihLD_mDkPuEIalQbkrQPSGahzfsg&s=R1DGN1kNpBZ4ZRBC
>>>>> TQzDPQlNYapuBNSYB4JTM6tO60w&e=
>>>>>
>>>>>
>>>>          [[alternative HTML version deleted]]
>>>>
>>>> _______________________________________________
>>>> Bioc-devel at r-project.org mailing list
>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.et
>>>> hz.ch_mailman_listinfo_bioc-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt
>>>> 84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=Jw
>>>> iMI-3BEUJlonlihLD_mDkPuEIalQbkrQPSGahzfsg&s=R1DGN1kNpBZ4ZRBC
>>>> TQzDPQlNYapuBNSYB4JTM6tO60w&e=
>>>>
>>>>
>>> -- 
>>> Hervé Pagès
>>>
>>> Program in Computational Biology
>>> Division of Public Health Sciences
>>> Fred Hutchinson Cancer Research Center
>>> 1100 Fairview Ave. N, M1-B514
>>> P.O. Box 19024
>>> Seattle, WA 98109-1024
>>>
>>> E-mail: hpages at fredhutch.org
>>> Phone:  (206) 667-5791
>>> Fax:    (206) 667-1319
>>>
>>
>>          [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioc-devel at r-project.org mailing list
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwIDaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=wF4spWJEN_2HbHOO3u2dTWX_9wJuQSNig3htV3I1UoU&s=2M9q9SsO2FEAe9ipdnecgo9SDHze2vKN7ET1t-ESqxU&e= 
>>
>>
>> _______________________________________________
>> Bioc-devel at r-project.org mailing list
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwIDaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=wF4spWJEN_2HbHOO3u2dTWX_9wJuQSNig3htV3I1UoU&s=2M9q9SsO2FEAe9ipdnecgo9SDHze2vKN7ET1t-ESqxU&e= 
>>
>>
> 
> 
> This email message may contain legally privileged and/or confidential 
> information.  If you are not the intended recipient(s), or the employee 
> or agent responsible for the delivery of this message to the intended 
> recipient(s), you are hereby notified that any disclosure, copying, 
> distribution, or use of this email message is prohibited.  If you have 
> received this message in error, please notify the sender immediately by 
> e-mail and delete this email message from your computer. Thank you.

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fredhutch.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioc-devel mailing list