[BioC] possible bug in ShortRead qa/ppnCount
Janet Young
jayoung at fhcrc.org
Thu Nov 17 02:32:15 CET 2011
Hi Martin (seems like you will be the one to pick this up),
I found an obscure bug in ShortRead:::.ppnCount - it'll be an easy fix, I think. I hope the code below should explain all, but if not let me know.
thanks,
Janet
library(ShortRead)
sp <- SolexaPath(system.file("extdata", package = "ShortRead"))
file1 <- file.path(analysisPath(sp), pattern="s_1_sequence.txt")
### take the most common sequence as a fake adaptor sequence
seqs1 <- readFastq(file1)
fakeAdaptor1 <- names(which(table(as.character(sread(seqs1)))==max(table(as.character(sread(seqs1)))))[1])
#### run qa
qa1 <- qa(readFastq(file1), basename(file1), Lpattern=fakeAdaptor1)
#### run qa forcing it to think sample name is different
qa1a <- qa(readFastq(file1), paste(basename(file1),"_a",sep=""), Lpattern=fakeAdaptor1)
### combine
qaboth <- do.call(rbind, list(qa1,qa1a) )
### looking directly at adapterContamination gives correct answer (same contamination level in both)
qaboth[["adapterContamination"]]
# lane contamination
# 1 s_1_sequence.txt 0.01171875
# 2 s_1_sequence.txt_a 0.01171875
### but looking at adapterContamination using ppnCount (as called by report(qa)) gives wrong answer - it's dividing the counts in column 2 by the names (as factors) in column 1. ppnCount should check for column 1 being numeric as well as just not null
ShortRead:::.ppnCount(qaboth[["adapterContamination"]])
# lane contamination
# 1 s_1_sequence.txt 0.012
# 2 s_1_sequence.txt_a 0.006
sessionInfo()
R version 2.14.0 (2011-10-31)
Platform: i386-apple-darwin9.8.0/i386 (32-bit)
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] ShortRead_1.12.0 latticeExtra_0.6-19 RColorBrewer_1.0-5 Rsamtools_1.6.1
[5] lattice_0.20-0 Biostrings_2.22.0 GenomicRanges_1.6.2 IRanges_1.12.1
loaded via a namespace (and not attached):
[1] Biobase_2.14.0 bitops_1.0-4.1 BSgenome_1.22.0 grid_2.14.0
[5] hwriter_1.3 RCurl_1.7-0 rtracklayer_1.14.2 tools_2.14.0
[9] XML_3.4-3 zlibbioc_1.0.0
More information about the Bioconductor
mailing list