[BioC] possible bug in ShortRead qa/ppnCount

Janet Young jayoung at fhcrc.org
Thu Nov 17 02:32:15 CET 2011


Hi Martin (seems like you will be the one to pick this up),

I found an obscure bug in ShortRead:::.ppnCount - it'll be an easy fix, I think.  I hope the code below should explain all, but if not let me know.

thanks,

Janet



library(ShortRead)

sp <- SolexaPath(system.file("extdata", package = "ShortRead"))
file1 <- file.path(analysisPath(sp), pattern="s_1_sequence.txt")

### take the most common sequence as a fake adaptor sequence
seqs1 <- readFastq(file1)
fakeAdaptor1 <- names(which(table(as.character(sread(seqs1)))==max(table(as.character(sread(seqs1)))))[1])

####  run qa 
qa1 <- qa(readFastq(file1), basename(file1), Lpattern=fakeAdaptor1)  

####  run qa forcing it to think sample name is different
qa1a <- qa(readFastq(file1), paste(basename(file1),"_a",sep=""), Lpattern=fakeAdaptor1)  

### combine
qaboth <- do.call(rbind, list(qa1,qa1a) )

### looking directly at adapterContamination gives correct answer (same contamination level in both)
qaboth[["adapterContamination"]]

#                 lane contamination
# 1   s_1_sequence.txt    0.01171875
# 2 s_1_sequence.txt_a    0.01171875

### but looking at adapterContamination using ppnCount (as called by report(qa)) gives wrong answer - it's dividing the counts in column 2 by the names (as factors) in column 1.  ppnCount should check for column 1 being numeric as well as just not null

ShortRead:::.ppnCount(qaboth[["adapterContamination"]])

#                 lane contamination
# 1   s_1_sequence.txt         0.012
# 2 s_1_sequence.txt_a         0.006



sessionInfo()

R version 2.14.0 (2011-10-31)
Platform: i386-apple-darwin9.8.0/i386 (32-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] ShortRead_1.12.0    latticeExtra_0.6-19 RColorBrewer_1.0-5  Rsamtools_1.6.1    
[5] lattice_0.20-0      Biostrings_2.22.0   GenomicRanges_1.6.2 IRanges_1.12.1     

loaded via a namespace (and not attached):
 [1] Biobase_2.14.0     bitops_1.0-4.1     BSgenome_1.22.0    grid_2.14.0       
 [5] hwriter_1.3        RCurl_1.7-0        rtracklayer_1.14.2 tools_2.14.0      
 [9] XML_3.4-3          zlibbioc_1.0.0    



More information about the Bioconductor mailing list