[BioC] ShortRead error when reading BAM file
Alex Gutteridge
alexg at ruggedtextile.com
Fri Sep 16 12:36:31 CEST 2011
On Thu, 15 Sep 2011 06:19:29 -0700, Martin Morgan wrote:
> On 09/15/2011 02:35 AM, Alex Gutteridge wrote:
>> I'm trying to load a BAM file generated by Mosaik using ShortRead,
>> but
>> I'm getting the following error:
>>
>>> aln.bam =
>>> readAligned("data/ALIGNMENT/A430001.1.samtools.bam",type="BAM")
>> Error: Input/Output
>> 'readAligned' failed to parse files
>> dirPath: 'data/ALIGNMENT/A430001.1.samtools.bam'
>> pattern: ''
>> type: 'BAM'
>> error: INTEGER() can only be applied to a 'integer', not a 'symbol'
>>> sessionInfo()
>> R version 2.12.0 (2010-10-15)
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>> locale:
>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
>> [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8
>> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats graphics grDevices utils datasets methods base
>>
>> other attached packages:
>> [1] ShortRead_1.8.1 Rsamtools_1.2.3 lattice_0.19-13
>> [4] Biostrings_2.18.0 GenomicRanges_1.2.1 IRanges_1.8.2
>>
>> loaded via a namespace (and not attached):
>> [1] Biobase_2.10.0 grid_2.12.0 hwriter_1.2 tools_2.12.0
>>
>> I ran samtools from the command-line over the original Mosiak BAM
>> file
>> and it completed fine:
>>
>> samtools view -b A430001.1.bam > A430001.1.samtools.bam
>>
>> but I get the above error on both the Mosaik original and samtools
>> processed BAM file.
>>
>> I also tried the debug suggested here:
>> https://stat.ethz.ch/pipermail/bioconductor/2010-October/035745.html
>> but
>> it segfaulted:
>>
>>> param = ScanBamParam(simpleCigar = TRUE, reverseComplement = TRUE,
>> + what = ShortRead:::.readAligned_bamWhat())
>>>
>>> res = scanBam('data/ALIGNMENT/A430001.1.samtools.bam', param=param)
>>
>> *** caught segfault ***
>> address (nil), cause 'unknown'
>>
>> Traceback:
>> 1: .Call(func, file, index, "rb", NULL, flag, simpleCigar, ...)
>> 2: .io_bam(.scan_bam, file, index, reverseComplement, tmpl, param =
>> param)
>> 3: scanBam("data/ALIGNMENT/A430001.1.samtools.bam", param = param)
>> 4: scanBam("data/ALIGNMENT/A430001.1.samtools.bam", param = param)
>>
>> Any suggestions to debug the file would be gratefully accepted.
>
> Hi Alex --
>
> I'd stick with
>
> param = ScanBamParam(simpleCigar = TRUE, reverseComplement = TRUE,
> what = ShortRead:::.readAligned_bamWhat())
> res = scanBam('data/ALIGNMENT/A430001.1.samtools.bam', param=param)
>
> as the starting point for debugging.
>
> My first suggestion is to update R to R-2-13.1, install Rsamtools,
> and try again.
>
> The next is more complicated, but not to bad. Start R with the 'gdb'
> debugger, provoke the error, and then find where the error occurred.
> It'll look some thing like
>
> R -d gdb
> gdb> run
> ...
> > ## now at the R prompt, do what you need to segfault
> ...
> gdb> where
>
> you'll have to type the 'run' and 'where' commands; 'where' will
> generate a backtrace, and if you could forward that to me (e.g., copy
> / paste) that would be great.
>
> Martin
Hi Martin,
Slightly different traceback with latest version, but essentially the
same:
> library(ShortRead)
Loading required package: IRanges
Attaching package: 'IRanges'
The following object(s) are masked from 'package:base':
cbind, eval, intersect, Map, mapply, order, paste, pmax, pmax.int,
pmin, pmin.int, rbind, rep.int, setdiff, table, union
Loading required package: GenomicRanges
Loading required package: Biostrings
Loading required package: lattice
Loading required package: Rsamtools
> aln = readAligned("data/ALIGNMENT/A430001.1.bam",type="BAM")
Error: Input/Output
'readAligned' failed to parse files
dirPath: 'data/ALIGNMENT/A430001.1.bam'
pattern: ''
type: 'BAM'
error: INTEGER() can only be applied to a 'integer', not a 'symbol'
file: data/ALIGNMENT/A430001.1.bam
> param = ScanBamParam(simpleCigar = TRUE, reverseComplement = TRUE,
+ what = ShortRead:::.readAligned_bamWhat())
> res = scanBam('data/ALIGNMENT/A430001.1.samtools.bam', param=param)
*** caught segfault ***
address (nil), cause 'unknown'
Traceback:
1: .Call(func, .extptr(file), space, flag, simpleCigar, ...)
2: doTryCatch(return(expr), name, parentenv, handler)
3: tryCatchOne(expr, names, parentenv, handlers[[1L]])
4: tryCatchList(expr, classes, parentenv, handlers)
5: tryCatch({ .Call(func, .extptr(file), space, flag, simpleCigar,
...)}, error = function(err) { stop(conditionMessage(err), "\n file:
", path(file))})
6: .io_bam(.scan_bamfile, file, param = param, path(file),
index(file), "rb", reverseComplement, tmpl)
7: scanBam(bam, param = param)
8: scanBam(bam, param = param)
9: eval(expr, envir, enclos)
10: eval(call, sys.frame(sys.parent()))
11: callGeneric(bam, ..., param = param)
12: scanBam("data/ALIGNMENT/A430001.1.samtools.bam", param = param)
13: scanBam("data/ALIGNMENT/A430001.1.samtools.bam", param = param)
###############
> sessionInfo()
R version 2.13.1 (2011-07-08)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] ShortRead_1.10.4 Rsamtools_1.4.3 lattice_0.19-30
[4] Biostrings_2.20.3 GenomicRanges_1.4.8 IRanges_1.10.6
loaded via a namespace (and not attached):
[1] Biobase_2.12.2 grid_2.13.1 hwriter_1.3
###########
Here is the result of segfaulting under gdb:
#########
Program received signal SIGSEGV, Segmentation fault.
R_gc_internal (size_needed=0) at memory.c:1427
1427 SEXP next = NEXT_NODE(s);
(gdb) where
#0 R_gc_internal (size_needed=0) at memory.c:1427
#1 0x000000000041e7f3 in Rf_cons (car=0x1bf38af0, cdr=0x163ef338)
at memory.c:2083
#2 0x0000000000555780 in Rf_evalList (el=0x1be8f6e0, rho=0x1c31a7a0,
call=0x1be8f6a8, n=1) at eval.c:1836
#3 0x0000000000554a17 in Rf_eval (e=0x1be8f6a8, rho=0x1c31a7a0) at
eval.c:501
#4 0x0000000000555580 in Rf_evalListKeepMissing (el=0x1c31a2f0,
rho=0x1c31a7a0) at eval.c:1900
#5 0x0000000000555ba5 in Rf_DispatchOrEval (call=0x1c31a360,
op=0x1640f938,
generic=0x616594 "[<-", args=0x1c31a328, rho=0x1c31a7a0,
ans=0x7fff256b5858, dropmissing=0, argsevald=0) at eval.c:2381
#6 0x000000000048f300 in do_subassign (call=0x0, op=0x8d1d78,
args=0x5443474741544747, rho=0x1c31a7a0) at subassign.c:1313
#7 0x0000000000554981 in Rf_eval (e=0x1c31a360, rho=0x1c31a7a0) at
eval.c:482
#8 0x0000000000557324 in do_set (call=0x1c31a408, op=0x1640e788,
args=0x1c31a3d0, rho=0x1c31a7a0) at eval.c:1722
#9 0x0000000000554981 in Rf_eval (e=0x1c31a408, rho=0x1c31a7a0) at
eval.c:482
#10 0x0000000000556c84 in applydefine (call=<value optimized out>,
op=0x1640e788, args=<value optimized out>, rho=0x1c31a7a0) at
eval.c:1678
#11 0x0000000000554981 in Rf_eval (e=0x1be8f590, rho=0x1c31a7a0) at
eval.c:482
#12 0x00000000005560ee in do_begin (call=0x1be8f360, op=0x1640e590,
args=0x5443474741544747, rho=0x1c31a7a0) at eval.c:1420
#13 0x0000000000554981 in Rf_eval (e=0x1be8f360, rho=0x1c31a7a0) at
eval.c:482
--
Alex Gutteridge
More information about the Bioconductor
mailing list