[Bioc-sig-seq] shortRead qa() minor bug

Martin Morgan mtmorgan at fhcrc.org
Wed May 18 06:18:58 CEST 2011


On 05/17/2011 05:34 PM, Marcus Davy wrote:
> This might also work;
>
> dfOccur <- with(df, {
>          nOccur <- tapply(nOccurrences, lane, c)
>          cumulative <- tapply(nOccurrences * nReads, lane, function(elt) {
>              cs <- cumsum(elt)
>              (cs - cs[1] + 1)/max(1, (diff(range(cs))+1))  ## Add 1 to
> difference
>          })
>         data.frame(nOccurrences = unlist(nOccur), cumulative =
> unlist(cumulative),
>              lane =lane, row.names=NULL)  ## Removing row names
>      })

Thanks Marcus I made the diff()+1 and row.names change to the devel 
branch; does it have more than minor aesthetic consequences? Martin
>
> Marcus
>
>
> On Wed, May 18, 2011 at 12:25 PM, Marcus Davy <mdavy86 at gmail.com
> <mailto:mdavy86 at gmail.com>> wrote:
>
>     Hi Martin,
>     there appears to be a minor bug in one of the qa() plots in the code;
>
>     ShortRead:::.plotReadOccurrences
>
>     If the code chunk which generates the dataset for plotting the read
>     occurences is run manually on an example dataset, the cumulative
>     proportion of reads can be slightly greater than 1. It looks like a
>     slight error in the denominator of the calculation.
>
>     library(ShortRead)
>
>     exptPath <- system.file("extdata", package = "ShortRead")
>     sp <- SolexaPath(exptPath)
>     qa <- qa(sp)
>
>     df <- qa[["sequenceDistribution"]]
>
>     tmp <- with(df, {
>              nOccur <- tapply(nOccurrences, lane, c)
>              cumulative <- tapply(nOccurrences * nReads, lane,
>     function(elt) {
>                  cs <- cumsum(elt)
>                  (cs - cs[1] + 1)/max(1, diff(range(cs))) ## denominator
>     under estimates by 1
>              })
>              lane <- tapply(lane, lane, c)
>              data.frame(nOccurrences = unlist(nOccur), cumulative =
>     unlist(cumulative),
>                  lane = unlist(lane))
>          })
>
>     print(tmp)
>                       nOccurrences   cumulative lane
>     s_2_export.txt1             1 0.0008347245    1
>     s_2_export.txt2             2 0.0091819699    1
>     s_2_export.txt3             3 0.0116861436    1
>     s_2_export.txt4             5 0.0158597663    1
>     s_2_export.txt5            10 0.0242070117    1
>     s_2_export.txt6             1 0.6435726210    1
>     s_2_export.txt7             2 0.6519198664    1
>     s_2_export.txt8             3 0.6544240401    1
>     s_2_export.txt9             9 0.6619365609    1
>     s_2_export.txt10            1 0.9991652755    1
>     s_2_export.txt11            2 1.0008347245    1
>
>     tmp[which(tmp$cumulative>1),]
>                       nOccurrences cumulative lane
>     s_2_export.txt11            2   1.000835    1
>
>     tmp <- with(df, {
>              nOccur <- tapply(nOccurrences, lane, c)
>              cumulative <- tapply(nOccurrences * nReads, lane,
>     function(elt) {
>                  cs <- cumsum(elt)
>                  (cs - cs[1] + 1)/max(1, (diff(range(cs))+1))  ## Add 1
>     to difference
>              })
>              lane <- tapply(as.character(lane), lane, c) ## as.character
>     to stop factor coercing
>             data.frame(nOccurrences = unlist(nOccur), cumulative =
>     unlist(cumulative),
>                  lane = unlist(lane), row.names=NULL)  ## Removing row names
>          })
>
>     print(tmp)
>         nOccurrences   cumulative           lane
>     1             1 0.0008340284 s_2_export.txt
>     2             2 0.0091743119 s_2_export.txt
>     3             3 0.0116763970 s_2_export.txt
>     4             5 0.0158465388 s_2_export.txt
>     5            10 0.0241868224 s_2_export.txt
>     6             1 0.6430358632 s_2_export.txt
>     7             2 0.6513761468 <tel:0.6513761468> s_2_export.txt
>     8             3 0.6538782319 s_2_export.txt
>     9             9 0.6613844871 <tel:0.6613844871> s_2_export.txt
>     10            1 0.9983319433 s_2_export.txt
>     11            2 1.0000000000 s_2_export.txt
>
>
>
>     One suggestion is that the first part of this code (or similar)
>     would be useful  in the
>     help so example(qa) could be run and testing with reporoducible
>     examples can be quicker.
>
>     cheers,
>
>     Marcus
>
>
>     sessionInfo()
>     R version 2.13.0 (2011-04-13)
>     Platform: i386-pc-mingw32/i386 (32-bit)
>
>     locale:
>     [1] LC_COLLATE=English_New Zealand.1252  LC_CTYPE=English_New
>     Zealand.1252
>     [3] LC_MONETARY=English_New Zealand.1252 LC_NUMERIC=C
>     [5] LC_TIME=English_New Zealand.1252
>
>     attached base packages:
>     [1] stats     graphics  grDevices utils     datasets  methods   base
>
>     other attached packages:
>     [1] ShortRead_1.10.3    Rsamtools_1.4.1     lattice_0.19-23
>     [4] Biostrings_2.20.1   GenomicRanges_1.4.3 IRanges_1.10.3
>
>     loaded via a namespace (and not attached):
>     [1] Biobase_2.12.1 grid_2.13.0    hwriter_1.3
>
>
>
>


-- 
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109

Location: M1-B861
Telephone: 206 667-2793



More information about the Bioc-sig-sequencing mailing list