[Bioc-sig-seq] Rsamtools: Select reads on a chromosome

Martin Morgan mtmorgan at fhcrc.org
Thu Jan 21 05:21:29 CET 2010


Hi Steve --

Steve Lianoglou wrote:
> Hi,
> 
> About selecting all reads on a chromosome:
> 
> On Thu, Jan 7, 2010 at 1:11 AM, Martin Morgan <mtmorgan at fhcrc.org> wrote:
> <snip>
> 
>>> which <- RangesList(chr1=IRanges(start=1, end=247249719))
>>> params <- ScanBamParams(which=which)
>>> reads <- scanBam(my.bam.file, param=params)[[1]]
>>>
>>> Is there are "better" way to do it, eg. w/o making the IRanges object
>>> that's stretches over the chromosome?
>> I don't think so, though 'end' doesn't have to be a literal end, e.g,.
>> .Machine$integer.max and 'stretches' doesn't really involve any cost --
>> just two numbers.
> 
> I just tried to do this in another context, but this actually send R
> into a tailspin, eg:
> 
> R> which <- RangesList(chr1=IRanges(start=1, end=.Machine$integer.max-1))
> R> r <- scanBam('scratch-sorted.bam', param=ScanBamParam(what='pos',
> which=which))
> 
>  *** caught segfault ***
> address 0x0, cause 'unknown'

Yes my suggestion seems right in principle but wrong in practice -- the
limit is 536870912L, and Rsamtools now checks for this. An easy way to
specify 'an entire chromosome' sounds like a reasonable feature request.

Thanks for the report, Steve.

Martin


> 
> Traceback:
>  1: .Call(func, file, index, "rb", list(space(which),
> .uunlist(start(which)),     .uunlist(end(which))), flag, simpleCigar,
> ...)
>  2: .io_bam(.scan_bam, file, index, tmpl, param = param)
>  3: .local(file, index, ...)
>  4: scanBam("scratch-sorted.bam", param = ScanBamParam(what = "pos",
>   which = which))
>  5: scanBam("scratch-sorted.bam", param = ScanBamParam(what = "pos",
>   which = which))
> 
> I have access to chromosome length information, so it's not really a
> problem for me, but it seems as if something is happening which you
> didn't expect, so I thought you'd like to know.
> 
> Thanks,
> -steve
> 
> ps: I'm using IRanges(.., end=.Machine$integer.max-1) because using
> .Machine$integer.max causes an integer overflow
> 


-- 
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793



More information about the Bioc-sig-sequencing mailing list