[Bioc-devel] "extra" unit tests

Levi Waldron lwaldron.research at gmail.com
Fri Oct 27 05:41:43 CEST 2017


The specific cases I had in mind were curatedMetagenomicData and
curatedTCGAData - thinking that the entire databases should be downloaded
and syntax-checked at some point, because there could be problems either
with the remote data files or how the convenience downloading functions
process them. But they're big downloads, so it's slow and hundreds of GB
download.

Regular http size and timestamp tests would be lightweight and good (sounds
doable if I don't exactly how yet). But still, in curatedTCGAData
especially, the post-download processing is complicated: converting tables
to SummarizedExperiment and RaggedExperiment, mapping the columns in each
omics data type to the clinical & pathological data, and assembling a
MultiAssayExperiment containing any combination of omics types for one or
more cancer types, and adding metadata (sorry for the shameless plug for
the forthcoming curatedTCGAData...). But anyways I'm not sure how to know
the objects will be assembled correctly without testing them in normal use
situations.

For non wastefulness, it's actually something where a workflow-type check
would make sense - only run if a size, timestamp, or downloader/assembler
file is changed, not documentation or some other change unrelated to the
download & assembly.



On Thu, Oct 26, 2017 at 5:11 AM, Martin Morgan <
martin.morgan at roswellpark.org> wrote:

> There is currently some capacity in the build system to support 'extended'
> builds.
>
> One possibility would be to provide facilities for packages to 'opt in' to
> a distinct 'extended' build, with a (weekly?) build report. One could also
> just increase the timeouts of the current builds.
>
> I think there is considerable value to imposing relatively severe time and
> space limitations on packages. A lot of R code is very poorly written, and
> the limits force the developer to confront that; admittedly a common
> response is not to write better R code. The unit test concept is really
> about highly focused tests on modular software; my own 'long' tests have in
> retrospect often been misguided attempts to throw the kitchen sink at code
> and hope that it covers things, rather than to decompose complicated
> functions into testable units that can then be assembled with some degree
> of confidence. Some of the most challenging code to test involves web
> services; probably the approach is not to perform numerous queries but to
> verify that that the service is responsive and providing a version that
> your package supports, with non-web queries validating conformance to the
> version. Often build times are dominated by vignettes analyzing 'real'
> data; these are probably more suited to ExperimentData packages where there
> are already more liberal space and time limits, and where the extended
> computation time does not undermine the pedagogical value of easily
> reproduced vignette code.
>
> I wonder how many people would opt in to an extended build. That wasn't,
> for instance, what Levi asked about at the start of the thread.
>
> Martin
>
>
> On 10/25/2017 01:05 PM, Vincent Carey wrote:
>
>> What about some more hardware to improve throughput?  I think complicating
>> the test
>> driving software is less desirable -- although perhaps it is just a day of
>> week check somewhere.
>> I can imagine that it fails on wednesday but then passes on thursday and
>> developer ignores the event...
>> The failure has to become sticky.  I vote for more hardware and a uniform
>> and stringent testing protocol.
>>
>> If there is no grant money for hardware maybe we have to look for more
>> commercial sponsorship.  This
>> part of the project should not be pinching pennies.
>>
>>
>> On Wed, Oct 25, 2017 at 12:56 PM, Kasper Daniel Hansen <
>> kasperdanielhansen at gmail.com> wrote:
>>
>> I think we need to think about this in the long term. Can we add support
>>> for these major tests in the build system, perhaps not every day, but
>>> every
>>> week or month?  The alternative, that it is up to the developer, is not
>>> great I think.  We should still advocate for people writing quicker
>>> tests,
>>> but there are some things which just take time.  The advantage of the
>>> build
>>> system is that it gets tested on the official 3 platforms, with official
>>> setup.
>>>
>>> Best,
>>> Kasper
>>>
>>>
>>>
>>> On Wed, Oct 25, 2017 at 11:27 AM, Michael Lawrence <
>>> lawrence.michael at gene.com> wrote:
>>>
>>> Looks like BiocCodeTools should start checking whether people are using
>>>> that and at least make a NOTE of it.
>>>>
>>>> On Tue, Oct 24, 2017 at 8:17 PM, Peter Hickey <peter.hickey at gmail.com>
>>>> wrote:
>>>>
>>>> A partial answer if you are using the 'testthat' framework: you can use
>>>>> `testthat::skip_on_bioc()` to specify that a test should be skipped if
>>>>>
>>>> it
>>>
>>>> is running on the BioC build machines. The test will otherwise be run
>>>>> (e.g., during local development). There are some other
>>>>>
>>>> `testthat::skip*()`
>>>>
>>>>> functions that may also be useful.
>>>>> Cheers,
>>>>> Pete
>>>>>
>>>>> On Wed, 25 Oct 2017 at 12:47 Levi Waldron <lwaldron.research at gmail.com
>>>>>
>>>>
>>>> wrote:
>>>>>
>>>>> Any thoughts about how to implement optional or "extra" unit tests,
>>>>>>
>>>>> that
>>>>
>>>>> are too resource intensive to be part of the Bioconductor daily
>>>>>>
>>>>> builds,
>>>
>>>> but
>>>>>
>>>>>> that should be run once in a while, say with major updates?
>>>>>>
>>>>>>          [[alternative HTML version deleted]]
>>>>>>
>>>>>> _______________________________________________
>>>>>> Bioc-devel at r-project.org mailing list
>>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>>>
>>>>>>
>>>>>          [[alternative HTML version deleted]]
>>>>>
>>>>> _______________________________________________
>>>>> Bioc-devel at r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>>
>>>>>
>>>>          [[alternative HTML version deleted]]
>>>>
>>>> _______________________________________________
>>>> Bioc-devel at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>
>>>>
>>>          [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> Bioc-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>
>>>
>>         [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioc-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>>
>
> This email message may contain legally privileged and/or...{{dropped:2}}
>
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>



-- 
Levi Waldron
http://www.waldronlab.org
Assistant Professor of Biostatistics     CUNY School of Public Health
US: +1 646-364-9616                                           Skype:
levi.waldron

	[[alternative HTML version deleted]]



More information about the Bioc-devel mailing list