[Bioc-devel] strange bug in a BioC workflow that only appears on Jenkis

Bernd Klaus bernd.klaus at embl.de
Mon Dec 7 11:00:01 CET 2015


Hi Dan,

wow, thanks a lot for identifying the problem! This is great
and gives me a hint on what to look into.

Thanks again,

Bernd

On Fr, 2015-12-04 at 10:55 -0800, Dan Tenenbaum wrote:
> 
> ----- Original Message -----
> > From: "Bernd Klaus" <bernd.klaus at embl.de>
> > To: "bioc-devel" <bioc-devel at r-project.org>
> > Sent: Thursday, December 3, 2015 3:41:51 AM
> > Subject: [Bioc-devel] strange bug in a BioC workflow that only
> > appears on	Jenkis
> 
> > Dear all,
> > 
> > I am currently developing an end-to-end workflow for Microarray
> > analysis.
> > 
> > In this workflow I download some clinical microarray
> > data from arrayExpress (CEL files),
> > import it with oligo, annotate it using the
> > appropriate ChipDB and then obtain
> > results with limma. This gives me a data.frame "tableC" with
> > the results from limma.
> > 
> > The data set contains paired inflammed/non-inflamed (I/nI) mucosa
> > samples from patients with Chron's diseaese(C) or ulcerative
> > colitis
> > (U).
> > 
> > In the workflow I only analyse the differences between I/nI samples
> > within the patients in C and obtain a limma results table called
> > "tableC".
> > 
> > I then want to extract the probeset IDs of
> > the DE genes like so:
> > 
> > DEgenesCD <- rownames(base::subset(tableC, adj.P.Val < 0.1))
> > 
> > Now, on my local computer(s) this gives me something like
> > 
> > # message(paste0(as.character(DEgenesCD)[1:5], collapse = "--"))
> > # > 7928695--8123695--8164535--8009746--7952249
> > 
> > However, on the CI system I get
> > 
> > # > NA--NA--NA--NA--NA
> > 
> > So it seems that the content of tableC "dissapears" somehow.
> > See e.g.
> > 
> > http://docbuilder.bioconductor.org:8080/job/maEndToEnd/58/label=win
> > buil
> > der1/console
> > 
> > The minimal dummy workflow that has the bug is here in the svn
> > 
> > https://hedgehog.fhcrc.org/bioconductor/trunk/madman/workflows/maEn
> > dToE
> > nd/
> > 
> > Now strangely enough, if I run it on my local machine, save the
> > expression data as an RData object, submit this object to svn and
> > the
> > load the pre-saved object in the workflow it builds successfully.
> > 
> > So my best guess is that there is something unusual happening
> > during
> > the creation of the eSet from the downloaded data that then somehow
> > affects the result table from limma.
> > 
> > 
> > I have been trying to chase this bug for ca. three weeks, so any
> > input
> > would be very much appreciated ...
> > 
> 
> So the workflow builder creates a package from your Rmd file and then
> tries to "R CMD build" the package.
> 
> On the workflow builder for linux, I can purl() the Rmd file to
> create an R file and then source() it in R without any errors.
> 
> However, when I try and "R CMD build" the package generated from this
> Rmd file, I get the same error you report.
> 
> Note that R CMD build runs a script which can be found at
> $R_HOME/bin/build, and that script invokes R in a special way. If I
> invoke R the same way:
> 
> R_DEFAULT_PACKAGES= LC_COLLATE=C R
> 
> ...and then source the purl()'ed Rmd file, I do get the same error as
> you report.
> So the issue has to do with one of those two environment variables.
> 
> Starting R and only changing one of the variables:
> 
> R_DEFAULT_PACKAGES= R
> 
> works, but if I change only the other one:
> 
> LC_COLLATE=C R
> 
> it fails. So the problem has to do with the setting of LC_COLLATE.
> LC_COLLATE in turn affects the sort order (see ?locales). So there is
> something in your code (or code in packages that you call) that does
> not work when the sort order is different.
> 
> You can debug it by starting R like this:
> 
> LC_COLLATE=C R
> 
> And then sourcing your file:
> 
> 
>  source("dummy-Workflow.R", echo=TRUE, max=Inf)
> 
> 
> This  assumes that you've first run 
> 
> R CMD Stangle dummy-workflow.Rmd
> 
> to produce the dummy-workflow.R file.
> 
> So this should fail and then you will have all the tools of an
> interactive R session (traceback(), sessionInfo(), debug()) available
> to troubleshoot the problem.
> 
> HTH
> Dan
> 
> 
> 
> > Thanks and best wishes,
> > 
> > Bernd
> > 
> > _______________________________________________
> > Bioc-devel at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/bioc-devel



More information about the Bioc-devel mailing list