[Bioc-devel] Rsamtools quitting ungracefully when indexing unsorted BAM file

Martin Morgan mtmorgan at fhcrc.org
Thu Dec 16 18:54:07 CET 2010


On 12/16/2010 02:38 AM, florian.hahne at novartis.com wrote:
> Hi List, Martin,
> when trying to index an unsorted BAM file, R crashes pretty violently:
>> indexBam("21_Unique_Hit_Alignment.bam")
> [bam_index_core] the alignment is not sorted
> (HWUSI-EAS1513_0004:3:92:7980:8996#0/1): 66179515 > 66178275 in 4-th chr
> hahnefl1 at chbslx1151
> /usr/people/ts/hahnefl1/projects/isilon_itox/genomatix/DNS/output/s_3_sequence.txt/genome\
> 
> 
>> sessionInfo()
> R version 2.13.0 Under development (unstable) (2010-12-09 r53820)
> Platform: x86_64-unknown-linux-gnu (64-bit)
> 
> locale:
>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
>  [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8  
>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                
>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C      
> 
> attached base packages:
> [1] stats     graphics  grDevices datasets  utils     methods   base    
> 
> other attached packages:
> [1] Rsamtools_1.3.11     Biostrings_2.19.0    GenomicRanges_1.1.38
> [4] IRanges_1.9.17      
> 
> loaded via a namespace (and not attached):
> [1] Biobase_2.11.0
>>
> 
> A friendly reminder to sort the file would probably do... Or maybe the
> file could be sorted somewhere in the background and then indexed?
> 
> Another suggestion: wouldn't it be super cool to expose parts of the
> samtools view functionality to convert between SAM and BAM files? It
> seem a bit odd that you can sort and index BAM files but can't convert
> them from SAM files, which still seem to be the preferred file type for
> many applications.

Hi Florian --

I checked code in to the devel branch (1.3.13) that instead throws a
warning (that the file isn't sorted) and then an error (when samtools
tries to exit() from its C code), and in principle one can continue in
R. So things are being handled more gracefully.

The solution isn't 100% good -- the samtools library is expecting to
exit, and could be leaving things in a confused state (probably just
memory leaks).

I'll add SAM -> BAM functionality; it'll suffer from the same type of issue.

Martin

> Best regards, Mit freundlichen Grüssen, Meilleures salutations,
> 
> *Florian Hahne
> Novartis Institute For Biomedical Research**
> Translational Sciences / PreClinical Safety / investigativeToxicology
> (iTOX)*
> Expert Data Integration and Modeling Bioinformatics
> CHBS, WKL-135.1.67
> Novartis Institute For Biomedical Research, Werk Klybeck
> Klybeckstrasse 141
> CH-4057 Basel
> Switzerland
> Phone: +41 61 6967127
> Email : _florian.hahne at novartis.com_ <mailto:florian.hahne at novartis.com>
> 
> 
> 


-- 
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109

Location: M1-B861
Telephone: 206 667-2793



More information about the Bioc-devel mailing list