[BioC] ShortRead error when reading BAM file

Martin Morgan mtmorgan at fhcrc.org
Fri Sep 16 15:43:51 CEST 2011


Hi Alex --

Unfortunately, not informative; it looks like the call has completed but 
corrupted memory in the process.

Any chance of a small data set, e.g., using samtools to select a portion 
of the file? Along the lines of

   samtools view -b A430001.1.bam chr1:1-1000000 > A430001.1.subset.bam

Or is there a reference MOSAIK output file somewhere that I can access?

Also, does specifying 'which' help, e.g.,

param = ScanBamParam(simpleCigar = TRUE, reverseComplement = TRUE,
   what = ShortRead:::.readAligned_bamWhat(),
   which=GRanges("chr1", IRanges(1, 100000)))

And if you're being indulgent and none of the above point to the problem 
/ are doable, create a file ~/.R/Makevars with

   CFLAGS += -g O0 -Dinline=""

(that's the letter O and the number zero) and reinstalling Rsamtools 
(this'll compile without any C optimizations) then

   R -d valgrind -f test.R

looking for output that says 'invalid read' or 'invalid write'. Thanks 
for working with me on this.

Martin

On 09/16/2011 03:36 AM, Alex Gutteridge wrote:
> On Thu, 15 Sep 2011 06:19:29 -0700, Martin Morgan wrote:
>> On 09/15/2011 02:35 AM, Alex Gutteridge wrote:
>>> I'm trying to load a BAM file generated by Mosaik using ShortRead, but
>>> I'm getting the following error:
>>>
>>>> aln.bam =
>>>> readAligned("data/ALIGNMENT/A430001.1.samtools.bam",type="BAM")
>>> Error: Input/Output
>>> 'readAligned' failed to parse files
>>> dirPath: 'data/ALIGNMENT/A430001.1.samtools.bam'
>>> pattern: ''
>>> type: 'BAM'
>>> error: INTEGER() can only be applied to a 'integer', not a 'symbol'
>>>> sessionInfo()
>>> R version 2.12.0 (2010-10-15)
>>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>>
>>> locale:
>>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
>>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
>>> [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8
>>> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
>>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>
>>> attached base packages:
>>> [1] stats graphics grDevices utils datasets methods base
>>>
>>> other attached packages:
>>> [1] ShortRead_1.8.1 Rsamtools_1.2.3 lattice_0.19-13
>>> [4] Biostrings_2.18.0 GenomicRanges_1.2.1 IRanges_1.8.2
>>>
>>> loaded via a namespace (and not attached):
>>> [1] Biobase_2.10.0 grid_2.12.0 hwriter_1.2 tools_2.12.0
>>>
>>> I ran samtools from the command-line over the original Mosiak BAM file
>>> and it completed fine:
>>>
>>> samtools view -b A430001.1.bam > A430001.1.samtools.bam
>>>
>>> but I get the above error on both the Mosaik original and samtools
>>> processed BAM file.
>>>
>>> I also tried the debug suggested here:
>>> https://stat.ethz.ch/pipermail/bioconductor/2010-October/035745.html but
>>> it segfaulted:
>>>
>>>> param = ScanBamParam(simpleCigar = TRUE, reverseComplement = TRUE,
>>> + what = ShortRead:::.readAligned_bamWhat())
>>>>
>>>> res = scanBam('data/ALIGNMENT/A430001.1.samtools.bam', param=param)
>>>
>>> *** caught segfault ***
>>> address (nil), cause 'unknown'
>>>
>>> Traceback:
>>> 1: .Call(func, file, index, "rb", NULL, flag, simpleCigar, ...)
>>> 2: .io_bam(.scan_bam, file, index, reverseComplement, tmpl, param =
>>> param)
>>> 3: scanBam("data/ALIGNMENT/A430001.1.samtools.bam", param = param)
>>> 4: scanBam("data/ALIGNMENT/A430001.1.samtools.bam", param = param)
>>>
>>> Any suggestions to debug the file would be gratefully accepted.
>>
>> Hi Alex --
>>
>> I'd stick with
>>
>> param = ScanBamParam(simpleCigar = TRUE, reverseComplement = TRUE,
>> what = ShortRead:::.readAligned_bamWhat())
>> res = scanBam('data/ALIGNMENT/A430001.1.samtools.bam', param=param)
>>
>> as the starting point for debugging.
>>
>> My first suggestion is to update R to R-2-13.1, install Rsamtools,
>> and try again.
>>
>> The next is more complicated, but not to bad. Start R with the 'gdb'
>> debugger, provoke the error, and then find where the error occurred.
>> It'll look some thing like
>>
>> R -d gdb
>> gdb> run
>> ...
>> > ## now at the R prompt, do what you need to segfault
>> ...
>> gdb> where
>>
>> you'll have to type the 'run' and 'where' commands; 'where' will
>> generate a backtrace, and if you could forward that to me (e.g., copy
>> / paste) that would be great.
>>
>> Martin
>
> Hi Martin,
>
> Slightly different traceback with latest version, but essentially the same:
>
>> library(ShortRead)
> Loading required package: IRanges
>
> Attaching package: 'IRanges'
>
> The following object(s) are masked from 'package:base':
>
> cbind, eval, intersect, Map, mapply, order, paste, pmax, pmax.int,
> pmin, pmin.int, rbind, rep.int, setdiff, table, union
>
> Loading required package: GenomicRanges
> Loading required package: Biostrings
> Loading required package: lattice
> Loading required package: Rsamtools
>> aln = readAligned("data/ALIGNMENT/A430001.1.bam",type="BAM")
> Error: Input/Output
> 'readAligned' failed to parse files
> dirPath: 'data/ALIGNMENT/A430001.1.bam'
> pattern: ''
> type: 'BAM'
> error: INTEGER() can only be applied to a 'integer', not a 'symbol'
> file: data/ALIGNMENT/A430001.1.bam
>> param = ScanBamParam(simpleCigar = TRUE, reverseComplement = TRUE,
> + what = ShortRead:::.readAligned_bamWhat())
>> res = scanBam('data/ALIGNMENT/A430001.1.samtools.bam', param=param)
> *** caught segfault ***
> address (nil), cause 'unknown'
>
> Traceback:
> 1: .Call(func, .extptr(file), space, flag, simpleCigar, ...)
> 2: doTryCatch(return(expr), name, parentenv, handler)
> 3: tryCatchOne(expr, names, parentenv, handlers[[1L]])
> 4: tryCatchList(expr, classes, parentenv, handlers)
> 5: tryCatch({ .Call(func, .extptr(file), space, flag, simpleCigar,
> ...)}, error = function(err) { stop(conditionMessage(err), "\n file: ",
> path(file))})
> 6: .io_bam(.scan_bamfile, file, param = param, path(file), index(file),
> "rb", reverseComplement, tmpl)
> 7: scanBam(bam, param = param)
> 8: scanBam(bam, param = param)
> 9: eval(expr, envir, enclos)
> 10: eval(call, sys.frame(sys.parent()))
> 11: callGeneric(bam, ..., param = param)
> 12: scanBam("data/ALIGNMENT/A430001.1.samtools.bam", param = param)
> 13: scanBam("data/ALIGNMENT/A430001.1.samtools.bam", param = param)
>
> ###############
>
>> sessionInfo()
> R version 2.13.1 (2011-07-08)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] ShortRead_1.10.4 Rsamtools_1.4.3 lattice_0.19-30
> [4] Biostrings_2.20.3 GenomicRanges_1.4.8 IRanges_1.10.6
>
> loaded via a namespace (and not attached):
> [1] Biobase_2.12.2 grid_2.13.1 hwriter_1.3
>
> ###########
>
> Here is the result of segfaulting under gdb:
>
> #########
>
> Program received signal SIGSEGV, Segmentation fault.
> R_gc_internal (size_needed=0) at memory.c:1427
> 1427 SEXP next = NEXT_NODE(s);
> (gdb) where
> #0 R_gc_internal (size_needed=0) at memory.c:1427
> #1 0x000000000041e7f3 in Rf_cons (car=0x1bf38af0, cdr=0x163ef338)
> at memory.c:2083
> #2 0x0000000000555780 in Rf_evalList (el=0x1be8f6e0, rho=0x1c31a7a0,
> call=0x1be8f6a8, n=1) at eval.c:1836
> #3 0x0000000000554a17 in Rf_eval (e=0x1be8f6a8, rho=0x1c31a7a0) at
> eval.c:501
> #4 0x0000000000555580 in Rf_evalListKeepMissing (el=0x1c31a2f0,
> rho=0x1c31a7a0) at eval.c:1900
> #5 0x0000000000555ba5 in Rf_DispatchOrEval (call=0x1c31a360, op=0x1640f938,
> generic=0x616594 "[<-", args=0x1c31a328, rho=0x1c31a7a0,
> ans=0x7fff256b5858, dropmissing=0, argsevald=0) at eval.c:2381
> #6 0x000000000048f300 in do_subassign (call=0x0, op=0x8d1d78,
> args=0x5443474741544747, rho=0x1c31a7a0) at subassign.c:1313
> #7 0x0000000000554981 in Rf_eval (e=0x1c31a360, rho=0x1c31a7a0) at
> eval.c:482
> #8 0x0000000000557324 in do_set (call=0x1c31a408, op=0x1640e788,
> args=0x1c31a3d0, rho=0x1c31a7a0) at eval.c:1722
> #9 0x0000000000554981 in Rf_eval (e=0x1c31a408, rho=0x1c31a7a0) at
> eval.c:482
> #10 0x0000000000556c84 in applydefine (call=<value optimized out>,
> op=0x1640e788, args=<value optimized out>, rho=0x1c31a7a0) at eval.c:1678
> #11 0x0000000000554981 in Rf_eval (e=0x1be8f590, rho=0x1c31a7a0) at
> eval.c:482
> #12 0x00000000005560ee in do_begin (call=0x1be8f360, op=0x1640e590,
> args=0x5443474741544747, rho=0x1c31a7a0) at eval.c:1420
> #13 0x0000000000554981 in Rf_eval (e=0x1be8f360, rho=0x1c31a7a0) at
> eval.c:482
>


-- 
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109

Location: M1-B861
Telephone: 206 667-2793



More information about the Bioconductor mailing list