[Bioc-sig-seq] scanBam Error

Martin Morgan mtmorgan at fhcrc.org
Thu Dec 16 18:45:05 CET 2010


On 12/13/2010 02:00 PM, Dario Strbenac wrote:
> Hi,
> 
> Yes, that works fine, thanks. It must've been a size issue I was having.

Rsamtools 1.2.2 in release has been updated to say

  too many records, use 'param=ScanBamParam(which=<...>)'

when the number of reads / nucleotides results in more than 2^31-1
nucleotides; The devel version of Rsamtools also currently does this,
but the intention is to arrive at a more robust solution.

I think this addresses the problem, but would be happy to know if the
original example still fails.

Martin


> 
> ---- Original message ----
>> Date: Mon, 13 Dec 2010 17:31:24 +1000
>> From: Paul Leo <p.leo at uq.edu.au>  
>> Subject: Re: [Bioc-sig-seq] scanBam Error  
>> To: D.Strbenac at garvan.org.au
>> Cc: bioc-sig-sequencing at r-project.org
>>
>>   Do you need all the sequence data at once?
>>
>>   Instead of using a smaller bam file can you read in
>>   a smaller portion of your large bamfile ?
>>
>>   data.gr<-GRanges(seqnames
>>   =paste("chr",13,sep=""),ranges =
>>   IRanges(start=as.numeric(28608234),end=as.numeric(28608363)),strand="+")
>>
>>   which<-  data.gr
>>   params<-ScanBamParam(which=which,flag=scanBamFlag(isUnmappedQuery=FALSE,isDuplicate=NA,isValidVendorRead=TRUE),simpleCigar
>>   = FALSE,reverseComplement =
>>   FALSE,what=c("qname","flag","rname","seq","strand","pos","mpos","qwidth","cigar","qual","mapq","isize",
>>   "mrnm" ),tag="RG" ) # change to what you want
>>   aln1 <- scanBam("HS1808.bam",param=params)
>>
>>   aln1[[1]]
>>
>>   That should work fine?
>>
>> --                                                                     
>> Dr Paul Leo                                                            
>> Bioinformatician                                                       
>> UQ Diamantina Institute for Cancer, Immunology and Metabolic Medicine  
>> ---------------------------------------------------------------------  
>> Level 4, R Wing                                                        
>> Princess Alexandra Hospital                                            
>> Ipswich Rd                                                             
>> Woolloongabba QLD 4102                                                 
>> Tel: +61 7 3240 7740  Mob: 041 303 8691  Fax: +61 7 3240 5946          
>> Email: p.leo at uq.edu.au   Web: http://www.di.uq.edu.au                  
>>
>>   -----Original Message-----
>>   From: Dario Strbenac <D.Strbenac at garvan.org.au>
>>   Reply-to: D.Strbenac at garvan.org.au
>>   To: bioc-sig-sequencing at r-project.org
>>   Subject: Re: [Bioc-sig-seq] scanBam Error
>>   Date: Mon, 13 Dec 2010 17:15:38 +1100
>>
>> I tried it out by making a smaller bam file with only reads from one chromosome, and it worked fine. The full bam file is 4 GB and has 75 million reads in it. Could the size be a problem ? Could you test out a bam file of this size on your end, without me sending you one that big ? Also, the error is different after I put the scamBamParam in the right spot :
>>
>> Error in .Call(func, file, index, "rb", NULL, flag, simpleCigar, ...) :
>>   negative length vectors are not allowed
>>
>> Integer overflow somewhere, maybe ?
>>
>> - Dario.
>>
>> ---- Original message ----
>>> Date: Sun, 12 Dec 2010 20:59:23 -0800
>>> From: Martin Morgan <mtmorgan at fhcrc.org> 
>>> Subject: Re: [Bioc-sig-seq] scanBam Error 
>>> To: D.Strbenac at garvan.org.au
>>> Cc: bioc-sig-sequencing at r-project.org
>>>
>>> On 12/12/2010 08:00 PM, Dario Strbenac wrote:
>>>> Hello,
>>>>
>>>
>>>> I'm having trouble reading in a BAM file when "seq" is one of the
>>> strings passed to the what argument of ScanBamParam. If it's not, then
>>> the the reading completes successfully. I don't understand what the
>>> error means. It is :
>>>>
>>>> Error in .io_bam(.scan_bam, file, index, reverseComplement, tmpl, param = param) :
>>>>   INTEGER() can only be applied to a 'integer', not a 'closure'
>>>>
>>>> The traceback is :
>>>>
>>>>> traceback()
>>>> 4: .Call(func, file, index, "rb", NULL, flag, simpleCigar, ...)
>>>> 3: .io_bam(.scan_bam, file, index, reverseComplement, tmpl, param = param)
>>>> 2: scanBam("HS1808.bam", flag = ScanBamFlag(isDuplicate = FALSE),
>>>>        param = ScanBamParam(reverseComplement = TRUE, what = c("rname",
>>>>            "strand", "pos", "seq")))
>>>> 1: scanBam("HS1808.bam", flag = ScanBamFlag(isDuplicate = FALSE),
>>>>        param = ScanBamParam(reverseComplement = TRUE, what = c("rname",
>>>>            "strand", "pos", "seq")))
>>>>
>>>> and the environment is :
>>>>
>>>> R version 2.12.0 (2010-10-15)
>>>> Platform: x86_64-pc-mingw32/x64 (64-bit)
>>>>
>>>> locale:
>>>> [1] LC_COLLATE=English_Australia.1252  LC_CTYPE=English_Australia.1252    LC_MONETARY=English_Australia.1252 LC_NUMERIC=C                       LC_TIME=English_Australia.1252   
>>>>
>>>> attached base packages:
>>>> [1] stats     graphics  grDevices utils     datasets  methods   base    
>>>>
>>>> other attached packages:
>>>> [1] Rsamtools_1.2.1     Biostrings_2.18.0   GenomicRanges_1.2.0 IRanges_1.8.2     
>>>>
>>>> loaded via a namespace (and not attached):
>>>> [1] Biobase_2.8.0
>>>
>>> Hi Dario -- this is some kind of error in Rsamtools' C code, but I'm not
>>> able to reproduce it on my end so can't track it down. Is there any way
>>> of producing and sharing with me an example file that has this problem?
>>>
>>> One thing (not causing the bug) in your traceback is that 'flag' should
>>> be an argument to ScanBamParam; as it is I think it is being silently
>>> ignored.
>>>
>>> Martin
>>>
>>>>
>>>> --------------------------------------
>>>> Dario Strbenac
>>>> Research Assistant
>>>> Cancer Epigenetics
>>>> Garvan Institute of Medical Research
>>>> Darlinghurst NSW 2010
>>>> Australia
>>>>
>>>> _______________________________________________
>>>> Bioc-sig-sequencing mailing list
>>>> Bioc-sig-sequencing at r-project.org
>>>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>>>
>>>
>>> --
>>> Computational Biology
>>> Fred Hutchinson Cancer Research Center
>>> 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
>>>
>>> Location: M1-B861
>>> Telephone: 206 667-2793
>>
>> _______________________________________________
>> Bioc-sig-sequencing mailing list
>> Bioc-sig-sequencing at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
> 
> 
> --------------------------------------
> Dario Strbenac
> Research Assistant
> Cancer Epigenetics
> Garvan Institute of Medical Research
> Darlinghurst NSW 2010
> Australia
> 
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing


-- 
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109

Location: M1-B861
Telephone: 206 667-2793



More information about the Bioc-sig-sequencing mailing list