[BioC] trimTails function in ShortRead package give different results on the same input
Martin Morgan
mtmorgan at fhcrc.org
Wed Oct 17 23:21:57 CEST 2012
On 10/15/2012 5:34 AM, Zhenyu Xu wrote:
> Hi ShortRead package developer,
>
> I tried to use the function trimTails to trim some bad quality bases from reads coming out of 454 sequencing machine. However I got different results if I run the command several times starting from the same ShortReadQ object and same trimming parameter. This is observed in centos linux machine (6.2 and 6.3). I also tried this with my own mac machine, but the results are identical. So seems the problem only restrict to centos linux machine (Not sure other linux platform has this problem or not). the data sets(~11Mb) can be downloaded at http://dl.dropbox.com/u/68829208/454reads.rds.
Thank you for the bug report, data, and reproducible example. This has been
fixed in ShortRead 1.16.1 and in the devel branch, and should be available via
biocLite after about 10am Seattle time, tomorrow.
The problem was only with successive=TRUE.
Martin
> best,
> zhenyu
>
> Please see the following of the execution:
>
> wget http://dl.dropbox.com/u/68829208/454reads.rds
> R
>
> R version 2.15.1 (2012-06-22) -- "Roasted Marshmallows"
> Copyright (C) 2012 The R Foundation for Statistical Computing
> ISBN 3-900051-07-0
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> R is free software and comes with ABSOLUTELY NO WARRANTY.
> You are welcome to redistribute it under certain conditions.
> Type 'license()' or 'licence()' for distribution details.
>
> Natural language support but running in an English locale
>
> R is a collaborative project with many contributors.
> Type 'contributors()' for more information and
> 'citation()' on how to cite R or R packages in publications.
>
> Type 'demo()' for some demos, 'help()' for on-line help, or
> 'help.start()' for an HTML browser interface to help.
> Type 'q()' to quit R.
>
>> library(ShortRead)
> Loading required package: BiocGenerics
>
> Attaching package: ‘BiocGenerics’
>
> The following object(s) are masked from ‘package:stats’:
>
> xtabs
>
> The following object(s) are masked from ‘package:base’:
>
> anyDuplicated, cbind, colnames, duplicated, eval, Filter, Find,
> get, intersect, lapply, Map, mapply, mget, order, paste, pmax,
> pmax.int, pmin, pmin.int, Position, rbind, Reduce, rep.int,
> rownames, sapply, setdiff, table, tapply, union, unique
>
> Loading required package: IRanges
> Loading required package: GenomicRanges
> Loading required package: Biostrings
> Loading required package: lattice
> Loading required package: Rsamtools
> Loading required package: latticeExtra
> Loading required package: RColorBrewer
>> readsSub <- readRDS("454reads.rds")
>> readsSub
> class: ShortReadQ
> length: 5460 reads; width: 5..424 cycles
>> trimTails(readsSub, 20, "5", successive=TRUE)
> class: ShortReadQ
> length: 5460 reads; width: 3..416 cycles
>> trimTails(readsSub, 20, "5", successive=TRUE)
> class: ShortReadQ
> length: 5460 reads; width: 3..416 cycles
>> trimTails(readsSub, 20, "5", successive=TRUE)
> class: ShortReadQ
> length: 5460 reads; width: 4..424 cycles
>> trimTails(readsSub, 20, "5", successive=TRUE)
> class: ShortReadQ
> length: 5460 reads; width: 5..416 cycles
>> trimTails(readsSub, 20, "5", successive=TRUE)
> class: ShortReadQ
> length: 5460 reads; width: 4..424 cycles
>> x = trimTails(readsSub, 20, "5", successive=TRUE)
>> y = trimTails(readsSub, 20, "5", successive=TRUE)
>> sum(width(x)!=width(y))
> [1] 1325
>> sessionInfo()
> R version 2.15.1 (2012-06-22)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
> [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C
> [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8
> [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8
> [7] LC_PAPER=C LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] ShortRead_1.14.4 latticeExtra_0.6-19 RColorBrewer_1.0-5
> [4] Rsamtools_1.8.5 lattice_0.20-6 Biostrings_2.24.1
> [7] GenomicRanges_1.8.9 IRanges_1.14.4 BiocGenerics_0.2.0
>
> loaded via a namespace (and not attached):
> [1] Biobase_2.16.0 bitops_1.0-4.1 grid_2.15.1 hwriter_1.3 stats4_2.15.1
> [6] zlibbioc_1.2.0
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
--
Dr. Martin Morgan, PhD
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
More information about the Bioconductor
mailing list