[Bioc-devel] strange bug in a BioC workflow that only appears on Jenkis

Bernd Klaus bernd.klaus at embl.de
Mon Dec 14 11:43:04 CET 2015


Hi Dan,

I found the bug with the help of Julian. It was very trivial in the
end: 

Having a different LC_COLLATE setting changed the default sorting order
which lead to a sample swap during the importing of the CEL files.

This then changed the results of the DE analysis downstream. 

Thanks a lot again for your help and input!

Bernd 


On Mo, 2015-12-07 at 11:00 +0100, Bernd Klaus wrote:
> Hi Dan,
> 
> wow, thanks a lot for identifying the problem! This is great
> and gives me a hint on what to look into.
> 
> Thanks again,
> 
> Bernd
> 
> On Fr, 2015-12-04 at 10:55 -0800, Dan Tenenbaum wrote:
> > 
> > ----- Original Message -----
> > > From: "Bernd Klaus" <bernd.klaus at embl.de>
> > > To: "bioc-devel" <bioc-devel at r-project.org>
> > > Sent: Thursday, December 3, 2015 3:41:51 AM
> > > Subject: [Bioc-devel] strange bug in a BioC workflow that only
> > > appears on	Jenkis
> > 
> > > Dear all,
> > > 
> > > I am currently developing an end-to-end workflow for Microarray
> > > analysis.
> > > 
> > > In this workflow I download some clinical microarray
> > > data from arrayExpress (CEL files),
> > > import it with oligo, annotate it using the
> > > appropriate ChipDB and then obtain
> > > results with limma. This gives me a data.frame "tableC" with
> > > the results from limma.
> > > 
> > > The data set contains paired inflammed/non-inflamed (I/nI) mucosa
> > > samples from patients with Chron's diseaese(C) or ulcerative
> > > colitis
> > > (U).
> > > 
> > > In the workflow I only analyse the differences between I/nI samples
> > > within the patients in C and obtain a limma results table called
> > > "tableC".
> > > 
> > > I then want to extract the probeset IDs of
> > > the DE genes like so:
> > > 
> > > DEgenesCD <- rownames(base::subset(tableC, adj.P.Val < 0.1))
> > > 
> > > Now, on my local computer(s) this gives me something like
> > > 
> > > # message(paste0(as.character(DEgenesCD)[1:5], collapse = "--"))
> > > # > 7928695--8123695--8164535--8009746--7952249
> > > 
> > > However, on the CI system I get
> > > 
> > > # > NA--NA--NA--NA--NA
> > > 
> > > So it seems that the content of tableC "dissapears" somehow.
> > > See e.g.
> > > 
> > > http://docbuilder.bioconductor.org:8080/job/maEndToEnd/58/label=win
> > > buil
> > > der1/console
> > > 
> > > The minimal dummy workflow that has the bug is here in the svn
> > > 
> > > https://hedgehog.fhcrc.org/bioconductor/trunk/madman/workflows/maEn
> > > dToE
> > > nd/
> > > 
> > > Now strangely enough, if I run it on my local machine, save the
> > > expression data as an RData object, submit this object to svn and
> > > the
> > > load the pre-saved object in the workflow it builds successfully.
> > > 
> > > So my best guess is that there is something unusual happening
> > > during
> > > the creation of the eSet from the downloaded data that then somehow
> > > affects the result table from limma.
> > > 
> > > 
> > > I have been trying to chase this bug for ca. three weeks, so any
> > > input
> > > would be very much appreciated ...
> > > 
> > 
> > So the workflow builder creates a package from your Rmd file and then
> > tries to "R CMD build" the package.
> > 
> > On the workflow builder for linux, I can purl() the Rmd file to
> > create an R file and then source() it in R without any errors.
> > 
> > However, when I try and "R CMD build" the package generated from this
> > Rmd file, I get the same error you report.
> > 
> > Note that R CMD build runs a script which can be found at
> > $R_HOME/bin/build, and that script invokes R in a special way. If I
> > invoke R the same way:
> > 
> > R_DEFAULT_PACKAGES= LC_COLLATE=C R
> > 
> > ...and then source the purl()'ed Rmd file, I do get the same error as
> > you report.
> > So the issue has to do with one of those two environment variables.
> > 
> > Starting R and only changing one of the variables:
> > 
> > R_DEFAULT_PACKAGES= R
> > 
> > works, but if I change only the other one:
> > 
> > LC_COLLATE=C R
> > 
> > it fails. So the problem has to do with the setting of LC_COLLATE.
> > LC_COLLATE in turn affects the sort order (see ?locales). So there is
> > something in your code (or code in packages that you call) that does
> > not work when the sort order is different.
> > 
> > You can debug it by starting R like this:
> > 
> > LC_COLLATE=C R
> > 
> > And then sourcing your file:
> > 
> > 
> >  source("dummy-Workflow.R", echo=TRUE, max=Inf)
> > 
> > 
> > This  assumes that you've first run 
> > 
> > R CMD Stangle dummy-workflow.Rmd
> > 
> > to produce the dummy-workflow.R file.
> > 
> > So this should fail and then you will have all the tools of an
> > interactive R session (traceback(), sessionInfo(), debug()) available
> > to troubleshoot the problem.
> > 
> > HTH
> > Dan
> > 
> > 
> > 
> > > Thanks and best wishes,
> > > 
> > > Bernd
> > > 
> > > _______________________________________________
> > > Bioc-devel at r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/bioc-devel
> 
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel



More information about the Bioc-devel mailing list