[Bioc-sig-seq] EdgeR bug (stable (1.4.3) and devel (1.5.5)) : the genes "columns" does not get ordered as the "counts" in the topTags results.

Nicolas Delhomme delhomme at embl.de
Thu Jan 21 16:03:00 CET 2010


Good afternoon,

I've had the following bug in edgeR version 1.4.3 (bioC 2.5 stable).

I created my object like this:

dge<-DGEList(
                counts=counts,
                group=sub("\\.\\d","",colnames(counts)),
                lib.size=colSums(counts),
                genes=transcripts)

Then I estimated the common dispersion:

dge <- estimateCommonDisp(dge)

And did a contrast between two conditions and got the information back  
in a data.frame:

tags <- topTags(exactTest(dge,pair=c("gal4","twi")),n=.Machine 
$integer.max)

There, I realized that the genes were not ordered accordingly to the  
colnames. The colnames are correct. See the results below.

 > tags$table[1:10,]
                   genes   logConc     logFC       PValue          FDR
FBtr0084431 FBtr0087887 -33.16413 33.703845 9.050816e-50 1.834510e-45
FBtr0084439 FBtr0087893 -17.89889  5.696000 2.391551e-45 2.423717e-41
FBtr0071928 FBtr0074873 -19.97871  7.066663 1.814922e-43 1.226222e-39
FBtr0077809 FBtr0070400 -33.99876 32.034581 1.677380e-29 8.499706e-26
FBtr0300181 FBtr0074211 -21.99303  7.636927 1.945823e-26 7.887978e-23
FBtr0301322 FBtr0084735 -34.32748 31.377138 8.141553e-24 2.750352e-20
FBtr0299839 FBtr0084424 -19.89508  4.299143 9.575473e-23 2.426066e-19
FBtr0299841 FBtr0084426 -19.89508  4.299143 9.575473e-23 2.426066e-19
FBtr0299840 FBtr0084425 -19.91022  4.268863 2.075009e-22 4.673152e-19
FBtr0079322 FBtr0071074 -19.40663  3.933231 1.308765e-21 2.652735e-18

So I downloaded the latest version (1.5.5, bioC 2.6 devel) and I could  
reproduce the same bug. Actually before I could do that I got yet  
another error message saying:

"Number of rows in the annotation object 'genes' is not equal to the  
number of rows in the matrix of counts.
   Must have an annotation for each gene/tag, if annotation is to be  
used."

whereas I didn't change anything to the above command lines. I tracked  
it down and what happens is that rows with empty counts are removed  
from the counts object but not from the genes one. And this obviously  
has the sanity check failing.

Changing the line 76 in classes.R (today's svn checkout) from

	else if(nrow(as.data.frame(genes))==nrow(x$counts)) x$genes <-  
as.data.frame(genes, stringsAsFactors=FALSE)

to

	else if(nrow(as.data.frame(genes)[!allZeros,,drop=FALSE])==nrow(x 
$counts)) x$genes <- as.data.frame(genes, stringsAsFactors=FALSE)[! 
allZeros,,drop=FALSE]

corrects that problem. I haven't investigated the afore mentioned one,  
as I expect it to be in some deeper functions and that I could work  
around it by selecting on the colnames.

Regards,

Nicolas Delhomme

PS: sessionInfo()

R version 2.10.0 (2009-10-26)
x86_64-unknown-linux-gnu

locale:
  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
  [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
  [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
  [1] doMC_1.2.0      multicore_0.1-3 foreach_1.3.0   codetools_0.2-2
  [5] iterators_1.0.3 edgeR_1.5.5     DESeq_0.7.4     locfit_1.5-5
  [9] lattice_0.18-3  akima_0.5-4     Biobase_2.6.1

loaded via a namespace (and not attached):
  [1] annotate_1.24.1     AnnotationDbi_1.8.1 DBI_0.2-5
  [4] dgenb_0.4.0         genefilter_1.28.2   geneplotter_1.24.0
  [7] grid_2.10.0         limma_3.2.1         MASS_7.3-4
[10] RColorBrewer_1.0-2  RSQLite_0.8-1       splines_2.10.0
[13] survival_2.35-8     xtable_1.5-6





---------------------------------------------------------------
Nicolas Delhomme

High Throughput Functional Genomics Center

European Molecular Biology Laboratory

Tel: +49 6221 387 8426
Email: nicolas.delhomme at embl.de
Meyerhofstrasse 1 - Postfach 10.2209
69102 Heidelberg, Germany



More information about the Bioc-sig-sequencing mailing list