[Bioc-devel] strange bug in a BioC workflow that only appears on Jenkis

Dan Tenenbaum dtenenba at fredhutch.org
Fri Dec 4 19:55:21 CET 2015



----- Original Message -----
> From: "Bernd Klaus" <bernd.klaus at embl.de>
> To: "bioc-devel" <bioc-devel at r-project.org>
> Sent: Thursday, December 3, 2015 3:41:51 AM
> Subject: [Bioc-devel] strange bug in a BioC workflow that only appears on	Jenkis

> Dear all,
> 
> I am currently developing an end-to-end workflow for Microarray
> analysis.
> 
> In this workflow I download some clinical microarray
> data from arrayExpress (CEL files),
> import it with oligo, annotate it using the
> appropriate ChipDB and then obtain
> results with limma. This gives me a data.frame "tableC" with
> the results from limma.
> 
> The data set contains paired inflammed/non-inflamed (I/nI) mucosa
> samples from patients with Chron's diseaese(C) or ulcerative colitis
> (U).
> 
> In the workflow I only analyse the differences between I/nI samples
> within the patients in C and obtain a limma results table called
> "tableC".
> 
> I then want to extract the probeset IDs of
> the DE genes like so:
> 
> DEgenesCD <- rownames(base::subset(tableC, adj.P.Val < 0.1))
> 
> Now, on my local computer(s) this gives me something like
> 
> # message(paste0(as.character(DEgenesCD)[1:5], collapse = "--"))
> # > 7928695--8123695--8164535--8009746--7952249
> 
> However, on the CI system I get
> 
> # > NA--NA--NA--NA--NA
> 
> So it seems that the content of tableC "dissapears" somehow.
> See e.g.
> 
> http://docbuilder.bioconductor.org:8080/job/maEndToEnd/58/label=winbuil
> der1/console
> 
> The minimal dummy workflow that has the bug is here in the svn
> 
> https://hedgehog.fhcrc.org/bioconductor/trunk/madman/workflows/maEndToE
> nd/
> 
> Now strangely enough, if I run it on my local machine, save the
> expression data as an RData object, submit this object to svn and the
> load the pre-saved object in the workflow it builds successfully.
> 
> So my best guess is that there is something unusual happening during
> the creation of the eSet from the downloaded data that then somehow
> affects the result table from limma.
> 
> 
> I have been trying to chase this bug for ca. three weeks, so any input
> would be very much appreciated ...
> 

So the workflow builder creates a package from your Rmd file and then tries to "R CMD build" the package.

On the workflow builder for linux, I can purl() the Rmd file to create an R file and then source() it in R without any errors.

However, when I try and "R CMD build" the package generated from this Rmd file, I get the same error you report.

Note that R CMD build runs a script which can be found at $R_HOME/bin/build, and that script invokes R in a special way. If I invoke R the same way:

R_DEFAULT_PACKAGES= LC_COLLATE=C R

...and then source the purl()'ed Rmd file, I do get the same error as you report.
So the issue has to do with one of those two environment variables.

Starting R and only changing one of the variables:

R_DEFAULT_PACKAGES= R

works, but if I change only the other one:

LC_COLLATE=C R

it fails. So the problem has to do with the setting of LC_COLLATE. LC_COLLATE in turn affects the sort order (see ?locales). So there is something in your code (or code in packages that you call) that does not work when the sort order is different.

You can debug it by starting R like this:

LC_COLLATE=C R

And then sourcing your file:


 source("dummy-Workflow.R", echo=TRUE, max=Inf)


This  assumes that you've first run 

R CMD Stangle dummy-workflow.Rmd

to produce the dummy-workflow.R file.

So this should fail and then you will have all the tools of an interactive R session (traceback(), sessionInfo(), debug()) available to troubleshoot the problem.

HTH
Dan



> Thanks and best wishes,
> 
> Bernd
> 
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel



More information about the Bioc-devel mailing list