[Bioc-devel] "extra" unit tests

Sean Davis seandavi at gmail.com
Fri Oct 27 15:49:22 CEST 2017


We do similar testing (mostly upstream of package building) for GEOmetadb
and SRAdb. I have been thinking of this problem as "integration testing"
rather than "unit testing".

https://stackoverflow.com/questions/5357601/whats-the-difference-between-unit-tests-and-integration-tests

The build system as it exists is great for unit testing, but not so much
for integration tests. Workflow-based "testing" might fall under the
integration testing definition. The unit testing frameworks in R (testthat,
for example) can be applied to integration testing, but I think it is worth
keeping the two types of testing somewhat separate for the reasons pointed
out in the stackoverflow discussion.

Sean


On Thu, Oct 26, 2017 at 11:41 PM, Levi Waldron <lwaldron.research at gmail.com>
wrote:

> The specific cases I had in mind were curatedMetagenomicData and
> curatedTCGAData - thinking that the entire databases should be downloaded
> and syntax-checked at some point, because there could be problems either
> with the remote data files or how the convenience downloading functions
> process them. But they're big downloads, so it's slow and hundreds of GB
> download.
>
> Regular http size and timestamp tests would be lightweight and good (sounds
> doable if I don't exactly how yet). But still, in curatedTCGAData
> especially, the post-download processing is complicated: converting tables
> to SummarizedExperiment and RaggedExperiment, mapping the columns in each
> omics data type to the clinical & pathological data, and assembling a
> MultiAssayExperiment containing any combination of omics types for one or
> more cancer types, and adding metadata (sorry for the shameless plug for
> the forthcoming curatedTCGAData...). But anyways I'm not sure how to know
> the objects will be assembled correctly without testing them in normal use
> situations.
>
> For non wastefulness, it's actually something where a workflow-type check
> would make sense - only run if a size, timestamp, or downloader/assembler
> file is changed, not documentation or some other change unrelated to the
> download & assembly.
>
>
>
> On Thu, Oct 26, 2017 at 5:11 AM, Martin Morgan <
> martin.morgan at roswellpark.org> wrote:
>
> > There is currently some capacity in the build system to support
> 'extended'
> > builds.
> >
> > One possibility would be to provide facilities for packages to 'opt in'
> to
> > a distinct 'extended' build, with a (weekly?) build report. One could
> also
> > just increase the timeouts of the current builds.
> >
> > I think there is considerable value to imposing relatively severe time
> and
> > space limitations on packages. A lot of R code is very poorly written,
> and
> > the limits force the developer to confront that; admittedly a common
> > response is not to write better R code. The unit test concept is really
> > about highly focused tests on modular software; my own 'long' tests have
> in
> > retrospect often been misguided attempts to throw the kitchen sink at
> code
> > and hope that it covers things, rather than to decompose complicated
> > functions into testable units that can then be assembled with some degree
> > of confidence. Some of the most challenging code to test involves web
> > services; probably the approach is not to perform numerous queries but to
> > verify that that the service is responsive and providing a version that
> > your package supports, with non-web queries validating conformance to the
> > version. Often build times are dominated by vignettes analyzing 'real'
> > data; these are probably more suited to ExperimentData packages where
> there
> > are already more liberal space and time limits, and where the extended
> > computation time does not undermine the pedagogical value of easily
> > reproduced vignette code.
> >
> > I wonder how many people would opt in to an extended build. That wasn't,
> > for instance, what Levi asked about at the start of the thread.
> >
> > Martin
> >
> >
> > On 10/25/2017 01:05 PM, Vincent Carey wrote:
> >
> >> What about some more hardware to improve throughput?  I think
> complicating
> >> the test
> >> driving software is less desirable -- although perhaps it is just a day
> of
> >> week check somewhere.
> >> I can imagine that it fails on wednesday but then passes on thursday and
> >> developer ignores the event...
> >> The failure has to become sticky.  I vote for more hardware and a
> uniform
> >> and stringent testing protocol.
> >>
> >> If there is no grant money for hardware maybe we have to look for more
> >> commercial sponsorship.  This
> >> part of the project should not be pinching pennies.
> >>
> >>
> >> On Wed, Oct 25, 2017 at 12:56 PM, Kasper Daniel Hansen <
> >> kasperdanielhansen at gmail.com> wrote:
> >>
> >> I think we need to think about this in the long term. Can we add support
> >>> for these major tests in the build system, perhaps not every day, but
> >>> every
> >>> week or month?  The alternative, that it is up to the developer, is not
> >>> great I think.  We should still advocate for people writing quicker
> >>> tests,
> >>> but there are some things which just take time.  The advantage of the
> >>> build
> >>> system is that it gets tested on the official 3 platforms, with
> official
> >>> setup.
> >>>
> >>> Best,
> >>> Kasper
> >>>
> >>>
> >>>
> >>> On Wed, Oct 25, 2017 at 11:27 AM, Michael Lawrence <
> >>> lawrence.michael at gene.com> wrote:
> >>>
> >>> Looks like BiocCodeTools should start checking whether people are using
> >>>> that and at least make a NOTE of it.
> >>>>
> >>>> On Tue, Oct 24, 2017 at 8:17 PM, Peter Hickey <peter.hickey at gmail.com
> >
> >>>> wrote:
> >>>>
> >>>> A partial answer if you are using the 'testthat' framework: you can
> use
> >>>>> `testthat::skip_on_bioc()` to specify that a test should be skipped
> if
> >>>>>
> >>>> it
> >>>
> >>>> is running on the BioC build machines. The test will otherwise be run
> >>>>> (e.g., during local development). There are some other
> >>>>>
> >>>> `testthat::skip*()`
> >>>>
> >>>>> functions that may also be useful.
> >>>>> Cheers,
> >>>>> Pete
> >>>>>
> >>>>> On Wed, 25 Oct 2017 at 12:47 Levi Waldron <
> lwaldron.research at gmail.com
> >>>>>
> >>>>
> >>>> wrote:
> >>>>>
> >>>>> Any thoughts about how to implement optional or "extra" unit tests,
> >>>>>>
> >>>>> that
> >>>>
> >>>>> are too resource intensive to be part of the Bioconductor daily
> >>>>>>
> >>>>> builds,
> >>>
> >>>> but
> >>>>>
> >>>>>> that should be run once in a while, say with major updates?
> >>>>>>
> >>>>>>          [[alternative HTML version deleted]]
> >>>>>>
> >>>>>> _______________________________________________
> >>>>>> Bioc-devel at r-project.org mailing list
> >>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >>>>>>
> >>>>>>
> >>>>>          [[alternative HTML version deleted]]
> >>>>>
> >>>>> _______________________________________________
> >>>>> Bioc-devel at r-project.org mailing list
> >>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >>>>>
> >>>>>
> >>>>          [[alternative HTML version deleted]]
> >>>>
> >>>> _______________________________________________
> >>>> Bioc-devel at r-project.org mailing list
> >>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >>>>
> >>>>
> >>>          [[alternative HTML version deleted]]
> >>>
> >>> _______________________________________________
> >>> Bioc-devel at r-project.org mailing list
> >>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >>>
> >>>
> >>         [[alternative HTML version deleted]]
> >>
> >> _______________________________________________
> >> Bioc-devel at r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >>
> >>
> >
> > This email message may contain legally privileged and/or...{{dropped:2}}
> >
> >
> > _______________________________________________
> > Bioc-devel at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >
>
>
>
> --
> Levi Waldron
> http://www.waldronlab.org
> Assistant Professor of Biostatistics     CUNY School of Public Health
> US: +1 646-364-9616                                           Skype:
> levi.waldron
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>



-- 
Sean Davis, MD, PhD
Center for Cancer Research
National Cancer Institute
National Institutes of Health
Bethesda, MD 20892
https://seandavi.github.io/
https://twitter.com/seandavis12

	[[alternative HTML version deleted]]



More information about the Bioc-devel mailing list