[BioC] Rsubread/Rsamtools mappings outside of chromosome ranges
Martin Morgan
mtmorgan at fhcrc.org
Sun Feb 12 18:28:21 CET 2012
On 02/10/2012 07:44 PM, Thomas Girke wrote:
> With some Illumina libraries I have been running into problems importing
> the read mappings from Rsubread into Rsamtools. After some testing I
> found out that some reads reported in Rsubread's SAM output have their
> end positions outside of the chromosome ranges. If those reads are
> removed from the SAM file then the import into Rsamtools works just
> fine. Below is a reproducible example of this problem.
>
> I could think of several solutions to fix this, e.g. not reporting reads
> outside of chromosome ranges or updating their length in the CIAGR string.
> An additional option could be to remove invalid mappings during the import
> into Rsamtools.
Hi Thomas --
for the latter and for the case where those reads are intrinsically not
interesting, it might be reasonable to specify a range on
readGappedAlignments that precludes the possibility of extension beyond
sequence ends.
> si <- seqinfo(BamFile(fl))
> gr <- GRanges(seqnames(si), IRanges(34, seqlengths(si)-34))
> bam <- readGappedAlignments(fl, param=ScanBamParam(which=gr))
One would like a better solution, either doing this automatically with a
warning, or warning rather than stopping on reads overlapping ends.
Martin
>
> Thanks,
>
> Thomas
>
> ################################
> ## Read mapping with Rsubread ##
> ################################
> library(Rsubread)
> ## Reference source: ftp://ftp.arabidopsis.org/home/tair/Sequences/whole_chromosomes/
> ## Fastq source: ftp://ftp-trace.ncbi.nih.gov/sra/sra-instant/reads/ByStudy/sra/SRP%2FSRP009%2FSRP009850/SRR390302/
> buildindex(basename="tair10chr.fasta", reference="tair10chr.fasta")
> align(index="tair10chr.fasta", readfile1="SRR390302.fastq", output_file="SRR390302_subread.sam", nthreads=8, indels=1, TH1=2)
>
> #####################################
> ## Process SAM file with Rsamtools ##
> #####################################
> library(Rsamtools)
> asBam(file="SRR390302_subread.sam", destination="SRR390302_subread")
> [1] "./data/SRR390302_subread.bam"
> reads<- readBamGappedAlignments("SRR390302_subread.bam", use.names=FALSE)
> Error in validObject(.Object) :
> invalid class "GappedAlignments" object: 'ranges' contains values outside of sequence bounds
>
> ## The error disappears if the following two reads are removed from the SAM file:
> ## SRR390302.434558 and SRR390302.2716043. Both reads have their end positions outside
> ## the range of Chr5 as illustrated here:
>
> ## Relevant chunks of SAM file generated by Rsubread
> @SQ SN:Chr1 LN:30427671
> @SQ SN:Chr2 LN:19698289
> @SQ SN:Chr3 LN:23459830
> @SQ SN:Chr4 LN:18585056
> @SQ SN:Chr5 LN:26975502
> @SQ SN:ChrC LN:154478
> @SQ SN:ChrM LN:366924
> SRR390302.1 0 Chr1 27018257 2.0 35M * 0 0 TGG...ACC III...GD)0
> SRR390302.2 4 * 0 0 * * 0 0 TCC...AAA III...+I@
> ...
> SRR390302.434558 0 Chr5 26975485 5.0 35M * 0 0 ATG...TGG III...&4(
> SRR390302.2716043 0 Chr5 26975486 3.0 35M * 0 0 CAT...GGT III...+2&
>
>
>> sessionInfo()
> R version 2.14.0 (2011-10-31)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
> [1] C
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] Rsamtools_1.6.3 Biostrings_2.22.0 GenomicRanges_1.6.4 IRanges_1.12.5 Rsubread_1.4.2
>
> loaded via a namespace (and not attached):
> [1] BSgenome_1.22.0 RCurl_1.8-0 XML_3.6-2 bitops_1.0-4.1 rtracklayer_1.14.4 tools_2.14.0 zlibbioc_1.0.0
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
--
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
Location: M1-B861
Telephone: 206 667-2793
More information about the Bioconductor
mailing list