[BioC] possible bug in ShortRead qa/ppnCount

Martin Morgan mtmorgan at fhcrc.org
Thu Nov 17 19:55:21 CET 2011


On 11/16/2011 05:32 PM, Janet Young wrote:
> Hi Martin (seems like you will be the one to pick this up),
>
> I found an obscure bug in ShortRead:::.ppnCount - it'll be an easy
> fix, I think.  I hope the code below should explain all, but if not
> let me know.

Thanks Janet should be fixed when 1.12.1 or 1.13.3 makes it to the 
outside world. Martin

>
> thanks,
>
> Janet
>
>
>
> library(ShortRead)
>
> sp<- SolexaPath(system.file("extdata", package = "ShortRead"))
> file1<- file.path(analysisPath(sp), pattern="s_1_sequence.txt")
>
> ### take the most common sequence as a fake adaptor sequence seqs1<-
> readFastq(file1) fakeAdaptor1<-
> names(which(table(as.character(sread(seqs1)))==max(table(as.character(sread(seqs1)))))[1])
>
>  ####  run qa qa1<- qa(readFastq(file1), basename(file1),
> Lpattern=fakeAdaptor1)
>
> ####  run qa forcing it to think sample name is different qa1a<-
> qa(readFastq(file1), paste(basename(file1),"_a",sep=""),
> Lpattern=fakeAdaptor1)
>
> ### combine qaboth<- do.call(rbind, list(qa1,qa1a) )
>
> ### looking directly at adapterContamination gives correct answer
> (same contamination level in both) qaboth[["adapterContamination"]]
>
> #                 lane contamination # 1   s_1_sequence.txt
> 0.01171875 # 2 s_1_sequence.txt_a    0.01171875
>
> ### but looking at adapterContamination using ppnCount (as called by
> report(qa)) gives wrong answer - it's dividing the counts in column 2
> by the names (as factors) in column 1.  ppnCount should check for
> column 1 being numeric as well as just not null
>
> ShortRead:::.ppnCount(qaboth[["adapterContamination"]])
>
> #                 lane contamination # 1   s_1_sequence.txt
> 0.012 # 2 s_1_sequence.txt_a         0.006
>
>
>
> sessionInfo()
>
> R version 2.14.0 (2011-10-31) Platform: i386-apple-darwin9.8.0/i386
> (32-bit)
>
> locale: [1]
> en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages: [1] stats     graphics  grDevices utils
> datasets  methods   base
>
> other attached packages: [1] ShortRead_1.12.0    latticeExtra_0.6-19
> RColorBrewer_1.0-5  Rsamtools_1.6.1 [5] lattice_0.20-0
> Biostrings_2.22.0   GenomicRanges_1.6.2 IRanges_1.12.1
>
> loaded via a namespace (and not attached): [1] Biobase_2.14.0
> bitops_1.0-4.1     BSgenome_1.22.0    grid_2.14.0 [5] hwriter_1.3
> RCurl_1.7-0        rtracklayer_1.14.2 tools_2.14.0 [9] XML_3.4-3
> zlibbioc_1.0.0
>
> _______________________________________________ Bioconductor mailing
> list Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the
> archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor


-- 
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109

Location: M1-B861
Telephone: 206 667-2793



More information about the Bioconductor mailing list