[BioC] possible bug in ShortRead qa/ppnCount
Martin Morgan
mtmorgan at fhcrc.org
Thu Nov 17 19:55:21 CET 2011
On 11/16/2011 05:32 PM, Janet Young wrote:
> Hi Martin (seems like you will be the one to pick this up),
>
> I found an obscure bug in ShortRead:::.ppnCount - it'll be an easy
> fix, I think. I hope the code below should explain all, but if not
> let me know.
Thanks Janet should be fixed when 1.12.1 or 1.13.3 makes it to the
outside world. Martin
>
> thanks,
>
> Janet
>
>
>
> library(ShortRead)
>
> sp<- SolexaPath(system.file("extdata", package = "ShortRead"))
> file1<- file.path(analysisPath(sp), pattern="s_1_sequence.txt")
>
> ### take the most common sequence as a fake adaptor sequence seqs1<-
> readFastq(file1) fakeAdaptor1<-
> names(which(table(as.character(sread(seqs1)))==max(table(as.character(sread(seqs1)))))[1])
>
> #### run qa qa1<- qa(readFastq(file1), basename(file1),
> Lpattern=fakeAdaptor1)
>
> #### run qa forcing it to think sample name is different qa1a<-
> qa(readFastq(file1), paste(basename(file1),"_a",sep=""),
> Lpattern=fakeAdaptor1)
>
> ### combine qaboth<- do.call(rbind, list(qa1,qa1a) )
>
> ### looking directly at adapterContamination gives correct answer
> (same contamination level in both) qaboth[["adapterContamination"]]
>
> # lane contamination # 1 s_1_sequence.txt
> 0.01171875 # 2 s_1_sequence.txt_a 0.01171875
>
> ### but looking at adapterContamination using ppnCount (as called by
> report(qa)) gives wrong answer - it's dividing the counts in column 2
> by the names (as factors) in column 1. ppnCount should check for
> column 1 being numeric as well as just not null
>
> ShortRead:::.ppnCount(qaboth[["adapterContamination"]])
>
> # lane contamination # 1 s_1_sequence.txt
> 0.012 # 2 s_1_sequence.txt_a 0.006
>
>
>
> sessionInfo()
>
> R version 2.14.0 (2011-10-31) Platform: i386-apple-darwin9.8.0/i386
> (32-bit)
>
> locale: [1]
> en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages: [1] stats graphics grDevices utils
> datasets methods base
>
> other attached packages: [1] ShortRead_1.12.0 latticeExtra_0.6-19
> RColorBrewer_1.0-5 Rsamtools_1.6.1 [5] lattice_0.20-0
> Biostrings_2.22.0 GenomicRanges_1.6.2 IRanges_1.12.1
>
> loaded via a namespace (and not attached): [1] Biobase_2.14.0
> bitops_1.0-4.1 BSgenome_1.22.0 grid_2.14.0 [5] hwriter_1.3
> RCurl_1.7-0 rtracklayer_1.14.2 tools_2.14.0 [9] XML_3.4-3
> zlibbioc_1.0.0
>
> _______________________________________________ Bioconductor mailing
> list Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the
> archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
--
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
Location: M1-B861
Telephone: 206 667-2793
More information about the Bioconductor
mailing list