[BioC] Rsamtools: sortBam() and 'samtools sort' does not give the same results
Martin Morgan
mtmorgan at fhcrc.org
Fri Aug 23 01:13:35 CEST 2013
On 08/22/2013 02:13 PM, Henrik Bengtsson wrote:
> Hi,
>
> I've got a BAM file 'foo.bam' [425,850,792 bytes] that I'm trying to
> sort. It was generated using BWA. I fail to sort it with
> Rsamtools::sortBam() [Rsamtools v1.12.4] whereas with 'samtools sort'
> [samtools v0.1.18 (r982:295)] it works.
>
> BAM FILE:
> % samtools view -H foo0.bam | head
> @HD VN:1.3 SO:coordinate
> @SQ SN:1 LN:249250621
> @SQ SN:2 LN:243199373
> @SQ SN:3 LN:198022430
> @SQ SN:4 LN:191154276
> @SQ SN:5 LN:180915260
> @SQ SN:6 LN:171115067
> @SQ SN:7 LN:159138663
> @SQ SN:8 LN:146364022
> @SQ SN:9 LN:141213431
>
> The following:
>> Rsamtools::sortBam("foo.bam", "foo.sort.byR")
>
> outputs file 'foo.sort.byR.bam' [425,293,387 bytes], but when trying
> to generate an index, both indexBam() and 'samtools index' throws an
> error:
>
>> Rsamtools::indexBam("foo.sort.byR.bam")
> Error in FUN("foo.sort.byR.bam"[[1L]], ...) : failed to build index
> file: foo.sort.byR.bam
> In addition: Warning messages:
> 1: In FUN("foo.sort.byR.bam"[[1L]], ...) :
> [bam_index_core] the alignment is not sorted: reads without
> coordinates prior to reads with coordinates.
> 2: In FUN("foo.sort.byR.bam"[[1L]], ...) :
> [bam_index_build2] fail to index the BAM file.
>
> and
>
> % samtools index foo.sort.byR.bam
> [bam_index_core] the alignment is not sorted: reads without
> coordinates prior to reads with coordinates.
> [bam_index_build2] fail to index the BAM file.
>
> Trying to call sortBam() multiple times changes nothing. However, if
> I use 'samtools sort' to sort the BAM file, it all works:
>
> % samtools sort foo.bam foo.sort
> [bam_sort_core] merging from 3 files...
>
> outputs file 'foo.sort.bam' [425,254,135 bytes]. With this file both
>
> % samtools index foo.sort.bam
>
> and
>
>> Rsamtools::indexBam("foo.sort.bam")
>
> work. They both generate identical file 'foo.sort.bam.bai' [8,274,272 bytes].
>
> I've tried also with a toy sample BAM file 'longreads.bam' [1,923,379
> bytes] and there I don't get any errors (although Rsamtools and
> 'samtools sort' do output differently sized *.sort.bam files).
>
> I've never had this issue before. Comments?
I think samtools-0.1.18 is from Sept 2, 2011 (from git)
Author: Heng Li <lh3 at live.co.uk>
Date: Fri Sep 2 16:57:34 2011 +0000
* Updated samtools to the latest
* Added seqtk.c
* Updated wgsim.c
* release samtools-0.1.18
versus (not sure why this didn't get into the NEWS file) in Rsamtools in release
r74547 | mtmorgan at fhcrc.org | 2013-03-18 22:45:53 -0700 (Mon, 18 Mar 2013) | 4 l
ines
update samtools to 0.1.19 beta
so one possibility is a regression in samtools. Happy to investigate further if
you've got a reproducible example.
Martin
>
> /Henrik
>
>> sessionInfo()
> R version 3.0.1 (2013-05-16)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
> [1] C
>
> attached base packages:
> [1] parallel stats graphics grDevices utils datasets methods
> [8] base
>
> other attached packages:
> [1] Rsamtools_1.12.4 Biostrings_2.28.0 GenomicRanges_1.12.4
> [4] IRanges_1.18.2 BiocGenerics_0.6.0
>
> loaded via a namespace (and not attached):
> [1] bitops_1.0-6 stats4_3.0.1 tools_3.0.1 zlibbioc_1.6.0
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M1 B861
Phone: (206) 667-2793
More information about the Bioconductor
mailing list