[BioC] Rsamtools: sortBam() and 'samtools sort' does not give the same results
Henrik Bengtsson
hb at biostat.ucsf.edu
Thu Aug 22 23:13:11 CEST 2013
Hi,
I've got a BAM file 'foo.bam' [425,850,792 bytes] that I'm trying to
sort. It was generated using BWA. I fail to sort it with
Rsamtools::sortBam() [Rsamtools v1.12.4] whereas with 'samtools sort'
[samtools v0.1.18 (r982:295)] it works.
BAM FILE:
% samtools view -H foo0.bam | head
@HD VN:1.3 SO:coordinate
@SQ SN:1 LN:249250621
@SQ SN:2 LN:243199373
@SQ SN:3 LN:198022430
@SQ SN:4 LN:191154276
@SQ SN:5 LN:180915260
@SQ SN:6 LN:171115067
@SQ SN:7 LN:159138663
@SQ SN:8 LN:146364022
@SQ SN:9 LN:141213431
The following:
> Rsamtools::sortBam("foo.bam", "foo.sort.byR")
outputs file 'foo.sort.byR.bam' [425,293,387 bytes], but when trying
to generate an index, both indexBam() and 'samtools index' throws an
error:
> Rsamtools::indexBam("foo.sort.byR.bam")
Error in FUN("foo.sort.byR.bam"[[1L]], ...) : failed to build index
file: foo.sort.byR.bam
In addition: Warning messages:
1: In FUN("foo.sort.byR.bam"[[1L]], ...) :
[bam_index_core] the alignment is not sorted: reads without
coordinates prior to reads with coordinates.
2: In FUN("foo.sort.byR.bam"[[1L]], ...) :
[bam_index_build2] fail to index the BAM file.
and
% samtools index foo.sort.byR.bam
[bam_index_core] the alignment is not sorted: reads without
coordinates prior to reads with coordinates.
[bam_index_build2] fail to index the BAM file.
Trying to call sortBam() multiple times changes nothing. However, if
I use 'samtools sort' to sort the BAM file, it all works:
% samtools sort foo.bam foo.sort
[bam_sort_core] merging from 3 files...
outputs file 'foo.sort.bam' [425,254,135 bytes]. With this file both
% samtools index foo.sort.bam
and
> Rsamtools::indexBam("foo.sort.bam")
work. They both generate identical file 'foo.sort.bam.bai' [8,274,272 bytes].
I've tried also with a toy sample BAM file 'longreads.bam' [1,923,379
bytes] and there I don't get any errors (although Rsamtools and
'samtools sort' do output differently sized *.sort.bam files).
I've never had this issue before. Comments?
/Henrik
> sessionInfo()
R version 3.0.1 (2013-05-16)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] C
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] Rsamtools_1.12.4 Biostrings_2.28.0 GenomicRanges_1.12.4
[4] IRanges_1.18.2 BiocGenerics_0.6.0
loaded via a namespace (and not attached):
[1] bitops_1.0-6 stats4_3.0.1 tools_3.0.1 zlibbioc_1.6.0
More information about the Bioconductor
mailing list