[BioC] ShortRead error when reading BAM file

Alex Gutteridge alexg at ruggedtextile.com
Fri Sep 16 12:36:31 CEST 2011


 On Thu, 15 Sep 2011 06:19:29 -0700, Martin Morgan wrote:
> On 09/15/2011 02:35 AM, Alex Gutteridge wrote:
>> I'm trying to load a BAM file generated by Mosaik using ShortRead, 
>> but
>> I'm getting the following error:
>>
>>> aln.bam = 
>>> readAligned("data/ALIGNMENT/A430001.1.samtools.bam",type="BAM")
>> Error: Input/Output
>> 'readAligned' failed to parse files
>> dirPath: 'data/ALIGNMENT/A430001.1.samtools.bam'
>> pattern: ''
>> type: 'BAM'
>> error: INTEGER() can only be applied to a 'integer', not a 'symbol'
>>> sessionInfo()
>> R version 2.12.0 (2010-10-15)
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>> locale:
>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
>> [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8
>> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats graphics grDevices utils datasets methods base
>>
>> other attached packages:
>> [1] ShortRead_1.8.1 Rsamtools_1.2.3 lattice_0.19-13
>> [4] Biostrings_2.18.0 GenomicRanges_1.2.1 IRanges_1.8.2
>>
>> loaded via a namespace (and not attached):
>> [1] Biobase_2.10.0 grid_2.12.0 hwriter_1.2 tools_2.12.0
>>
>> I ran samtools from the command-line over the original Mosiak BAM 
>> file
>> and it completed fine:
>>
>> samtools view -b A430001.1.bam > A430001.1.samtools.bam
>>
>> but I get the above error on both the Mosaik original and samtools
>> processed BAM file.
>>
>> I also tried the debug suggested here:
>> https://stat.ethz.ch/pipermail/bioconductor/2010-October/035745.html 
>> but
>> it segfaulted:
>>
>>> param = ScanBamParam(simpleCigar = TRUE, reverseComplement = TRUE,
>> + what = ShortRead:::.readAligned_bamWhat())
>>>
>>> res = scanBam('data/ALIGNMENT/A430001.1.samtools.bam', param=param)
>>
>> *** caught segfault ***
>> address (nil), cause 'unknown'
>>
>> Traceback:
>> 1: .Call(func, file, index, "rb", NULL, flag, simpleCigar, ...)
>> 2: .io_bam(.scan_bam, file, index, reverseComplement, tmpl, param = 
>> param)
>> 3: scanBam("data/ALIGNMENT/A430001.1.samtools.bam", param = param)
>> 4: scanBam("data/ALIGNMENT/A430001.1.samtools.bam", param = param)
>>
>> Any suggestions to debug the file would be gratefully accepted.
>
> Hi Alex --
>
> I'd stick with
>
> param = ScanBamParam(simpleCigar = TRUE, reverseComplement = TRUE,
>     what = ShortRead:::.readAligned_bamWhat())
> res = scanBam('data/ALIGNMENT/A430001.1.samtools.bam', param=param)
>
> as the starting point for debugging.
>
> My first suggestion is to update R to R-2-13.1, install Rsamtools,
> and try again.
>
> The next is more complicated, but not to bad. Start R with the 'gdb'
> debugger, provoke the error, and then find where the error occurred.
> It'll look some thing like
>
>   R -d gdb
>   gdb> run
>   ...
>   > ## now at the R prompt, do what you need to segfault
>   ...
>   gdb> where
>
> you'll have to type the 'run' and 'where' commands; 'where' will
> generate a backtrace, and if you could forward that to me (e.g., copy
> / paste) that would be great.
>
> Martin

 Hi Martin,

 Slightly different traceback with latest version, but essentially the 
 same:

> library(ShortRead)
 Loading required package: IRanges

 Attaching package: 'IRanges'

 The following object(s) are masked from 'package:base':

     cbind, eval, intersect, Map, mapply, order, paste, pmax, pmax.int,
     pmin, pmin.int, rbind, rep.int, setdiff, table, union

 Loading required package: GenomicRanges
 Loading required package: Biostrings
 Loading required package: lattice
 Loading required package: Rsamtools
> aln = readAligned("data/ALIGNMENT/A430001.1.bam",type="BAM")
 Error: Input/Output
   'readAligned' failed to parse files
   dirPath: 'data/ALIGNMENT/A430001.1.bam'
   pattern: ''
   type: 'BAM'
   error: INTEGER() can only be applied to a 'integer', not a 'symbol'
   file: data/ALIGNMENT/A430001.1.bam
> param = ScanBamParam(simpleCigar = TRUE, reverseComplement = TRUE,
 +     what = ShortRead:::.readAligned_bamWhat())
> res = scanBam('data/ALIGNMENT/A430001.1.samtools.bam', param=param)
  *** caught segfault ***
 address (nil), cause 'unknown'

 Traceback:
  1: .Call(func, .extptr(file), space, flag, simpleCigar, ...)
  2: doTryCatch(return(expr), name, parentenv, handler)
  3: tryCatchOne(expr, names, parentenv, handlers[[1L]])
  4: tryCatchList(expr, classes, parentenv, handlers)
  5: tryCatch({    .Call(func, .extptr(file), space, flag, simpleCigar, 
 ...)}, error = function(err) {    stop(conditionMessage(err), "\n  file: 
 ", path(file))})
  6: .io_bam(.scan_bamfile, file, param = param, path(file), 
 index(file),     "rb", reverseComplement, tmpl)
  7: scanBam(bam, param = param)
  8: scanBam(bam, param = param)
  9: eval(expr, envir, enclos)
 10: eval(call, sys.frame(sys.parent()))
 11: callGeneric(bam, ..., param = param)
 12: scanBam("data/ALIGNMENT/A430001.1.samtools.bam",     param = param)
 13: scanBam("data/ALIGNMENT/A430001.1.samtools.bam",     param = param)

 ###############

> sessionInfo()
 R version 2.13.1 (2011-07-08)
 Platform: x86_64-unknown-linux-gnu (64-bit)

 locale:
  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
  [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
  [9] LC_ADDRESS=C               LC_TELEPHONE=C
 [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

 attached base packages:
 [1] stats     graphics  grDevices utils     datasets  methods   base

 other attached packages:
 [1] ShortRead_1.10.4    Rsamtools_1.4.3     lattice_0.19-30
 [4] Biostrings_2.20.3   GenomicRanges_1.4.8 IRanges_1.10.6

 loaded via a namespace (and not attached):
 [1] Biobase_2.12.2 grid_2.13.1    hwriter_1.3

 ###########

 Here is the result of segfaulting under gdb:

 #########

 Program received signal SIGSEGV, Segmentation fault.
 R_gc_internal (size_needed=0) at memory.c:1427
 1427                    SEXP next = NEXT_NODE(s);
 (gdb) where
 #0  R_gc_internal (size_needed=0) at memory.c:1427
 #1  0x000000000041e7f3 in Rf_cons (car=0x1bf38af0, cdr=0x163ef338)
     at memory.c:2083
 #2  0x0000000000555780 in Rf_evalList (el=0x1be8f6e0, rho=0x1c31a7a0,
     call=0x1be8f6a8, n=1) at eval.c:1836
 #3  0x0000000000554a17 in Rf_eval (e=0x1be8f6a8, rho=0x1c31a7a0) at 
 eval.c:501
 #4  0x0000000000555580 in Rf_evalListKeepMissing (el=0x1c31a2f0,
     rho=0x1c31a7a0) at eval.c:1900
 #5  0x0000000000555ba5 in Rf_DispatchOrEval (call=0x1c31a360, 
 op=0x1640f938,
     generic=0x616594 "[<-", args=0x1c31a328, rho=0x1c31a7a0,
     ans=0x7fff256b5858, dropmissing=0, argsevald=0) at eval.c:2381
 #6  0x000000000048f300 in do_subassign (call=0x0, op=0x8d1d78,
     args=0x5443474741544747, rho=0x1c31a7a0) at subassign.c:1313
 #7  0x0000000000554981 in Rf_eval (e=0x1c31a360, rho=0x1c31a7a0) at 
 eval.c:482
 #8  0x0000000000557324 in do_set (call=0x1c31a408, op=0x1640e788,
     args=0x1c31a3d0, rho=0x1c31a7a0) at eval.c:1722
 #9  0x0000000000554981 in Rf_eval (e=0x1c31a408, rho=0x1c31a7a0) at 
 eval.c:482
 #10 0x0000000000556c84 in applydefine (call=<value optimized out>,
     op=0x1640e788, args=<value optimized out>, rho=0x1c31a7a0) at 
 eval.c:1678
 #11 0x0000000000554981 in Rf_eval (e=0x1be8f590, rho=0x1c31a7a0) at 
 eval.c:482
 #12 0x00000000005560ee in do_begin (call=0x1be8f360, op=0x1640e590,
     args=0x5443474741544747, rho=0x1c31a7a0) at eval.c:1420
 #13 0x0000000000554981 in Rf_eval (e=0x1be8f360, rho=0x1c31a7a0) at 
 eval.c:482

-- 
 Alex Gutteridge



More information about the Bioconductor mailing list