[BioC] Rsubread/Rsamtools mappings outside of chromosome ranges

Wei Shi shi at wehi.EDU.AU
Mon Feb 13 11:29:21 CET 2012


Dear Thomas,

Sorry for the problem you have encountered. This was a bug we have identified a little while ago and it has been fixed in the C version of subread aligner (http://subread.sourceforge.net). I'm now updating the C code in Rsubread package to fix this and will let you know when this is done (should be in a couple of days).

Cheers,
Wei

On Feb 13, 2012, at 11:12 AM, Thomas Girke wrote:

> Thanks Martin for your suggestion!
> 
> Thomas
> 
> On Sun, Feb 12, 2012 at 05:28:21PM +0000, Martin Morgan wrote:
>> On 02/10/2012 07:44 PM, Thomas Girke wrote:
>>> With some Illumina libraries I have been running into problems importing
>>> the read mappings from Rsubread into Rsamtools. After some testing I
>>> found out that some reads reported in Rsubread's SAM output have their
>>> end positions outside of the chromosome ranges. If those reads are
>>> removed from the SAM file then the import into Rsamtools works just
>>> fine. Below is a reproducible example of this problem.
>>> 
>>> I could think of several solutions to fix this, e.g. not reporting reads
>>> outside of chromosome ranges or updating their length in the CIAGR string.
>>> An additional option could be to remove invalid mappings during the import
>>> into Rsamtools.
>> 
>> Hi Thomas --
>> 
>> for the latter and for the case where those reads are intrinsically not 
>> interesting, it might be reasonable to specify a range on 
>> readGappedAlignments that precludes the possibility of extension beyond 
>> sequence ends.
>> 
>>> si <- seqinfo(BamFile(fl))
>>> gr <- GRanges(seqnames(si), IRanges(34, seqlengths(si)-34))
>>> bam <- readGappedAlignments(fl, param=ScanBamParam(which=gr))
>> 
>> One would like a better solution, either doing this automatically with a 
>> warning, or warning rather than stopping on reads overlapping ends.
>> 
>> Martin
>> 
>>> 
>>> Thanks,
>>> 
>>> Thomas
>>> 
>>> ################################
>>> ## Read mapping with Rsubread ##
>>> ################################
>>> library(Rsubread)
>>> ## Reference source: ftp://ftp.arabidopsis.org/home/tair/Sequences/whole_chromosomes/
>>> ## Fastq source: ftp://ftp-trace.ncbi.nih.gov/sra/sra-instant/reads/ByStudy/sra/SRP%2FSRP009%2FSRP009850/SRR390302/
>>> buildindex(basename="tair10chr.fasta", reference="tair10chr.fasta")
>>> align(index="tair10chr.fasta", readfile1="SRR390302.fastq", output_file="SRR390302_subread.sam", nthreads=8, indels=1, TH1=2)
>>> 
>>> #####################################
>>> ## Process SAM file with Rsamtools ##
>>> #####################################
>>> library(Rsamtools)
>>> asBam(file="SRR390302_subread.sam", destination="SRR390302_subread")
>>> [1] "./data/SRR390302_subread.bam"
>>> reads<- readBamGappedAlignments("SRR390302_subread.bam", use.names=FALSE)
>>> Error in validObject(.Object) :
>>>   invalid class "GappedAlignments" object: 'ranges' contains values outside of sequence bounds
>>> 
>>> ## The error disappears if the following two reads are removed from the SAM file:
>>> ## SRR390302.434558 and SRR390302.2716043. Both reads have their end positions outside
>>> ## the range of Chr5 as illustrated here:
>>> 
>>> ## Relevant chunks of SAM file generated by Rsubread
>>> @SQ     SN:Chr1 LN:30427671
>>> @SQ     SN:Chr2 LN:19698289
>>> @SQ     SN:Chr3 LN:23459830
>>> @SQ     SN:Chr4 LN:18585056
>>> @SQ     SN:Chr5 LN:26975502
>>> @SQ     SN:ChrC LN:154478
>>> @SQ     SN:ChrM LN:366924
>>> SRR390302.1     0       Chr1    27018257        2.0     35M     *       0       0       TGG...ACC     III...GD)0
>>> SRR390302.2     4       *       0       0       *       *       0       0       TCC...AAA     III...+I@
>>> ...
>>> SRR390302.434558        0       Chr5    26975485        5.0     35M     *       0       0       ATG...TGG     III...&4(
>>> SRR390302.2716043       0       Chr5    26975486        3.0     35M     *       0       0       CAT...GGT     III...+2&
>>> 
>>> 
>>>> sessionInfo()
>>> R version 2.14.0 (2011-10-31)
>>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>> 
>>> locale:
>>> [1] C
>>> 
>>> attached base packages:
>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>> 
>>> other attached packages:
>>> [1] Rsamtools_1.6.3     Biostrings_2.22.0   GenomicRanges_1.6.4 IRanges_1.12.5      Rsubread_1.4.2
>>> 
>>> loaded via a namespace (and not attached):
>>> [1] BSgenome_1.22.0    RCurl_1.8-0        XML_3.6-2          bitops_1.0-4.1     rtracklayer_1.14.4 tools_2.14.0       zlibbioc_1.0.0
>>> 
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>> 
>> 
>> -- 
>> Computational Biology
>> Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
>> 
>> Location: M1-B861
>> Telephone: 206 667-2793
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor


______________________________________________________________________
The information in this email is confidential and intend...{{dropped:6}}



More information about the Bioconductor mailing list