[Bioc-sig-seq] Rsamtools: Select reads on a chromosome

Steve Lianoglou mailinglist.honeypot at gmail.com
Wed Jan 20 23:08:28 CET 2010


Sorry, forgot to provide sessionInfo()

R version 2.11.0 Under development (unstable) (2009-12-28 r50849)
x86_64-apple-darwin9.8.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] Rsamtools_0.1.21   BSgenome_1.15.3    Biostrings_2.15.13 IRanges_1.5.26

loaded via a namespace (and not attached):
[1] Biobase_2.7.0


On Wed, Jan 20, 2010 at 5:04 PM, Steve Lianoglou
<mailinglist.honeypot at gmail.com> wrote:
> Hi,
>
> About selecting all reads on a chromosome:
>
> On Thu, Jan 7, 2010 at 1:11 AM, Martin Morgan <mtmorgan at fhcrc.org> wrote:
> <snip>
>
>>> which <- RangesList(chr1=IRanges(start=1, end=247249719))
>>> params <- ScanBamParams(which=which)
>>> reads <- scanBam(my.bam.file, param=params)[[1]]
>>>
>>> Is there are "better" way to do it, eg. w/o making the IRanges object
>>> that's stretches over the chromosome?
>>
>> I don't think so, though 'end' doesn't have to be a literal end, e.g,.
>> .Machine$integer.max and 'stretches' doesn't really involve any cost --
>> just two numbers.
>
> I just tried to do this in another context, but this actually send R
> into a tailspin, eg:
>
> R> which <- RangesList(chr1=IRanges(start=1, end=.Machine$integer.max-1))
> R> r <- scanBam('scratch-sorted.bam', param=ScanBamParam(what='pos',
> which=which))
>
>  *** caught segfault ***
> address 0x0, cause 'unknown'
>
> Traceback:
>  1: .Call(func, file, index, "rb", list(space(which),
> .uunlist(start(which)),     .uunlist(end(which))), flag, simpleCigar,
> ...)
>  2: .io_bam(.scan_bam, file, index, tmpl, param = param)
>  3: .local(file, index, ...)
>  4: scanBam("scratch-sorted.bam", param = ScanBamParam(what = "pos",
>  which = which))
>  5: scanBam("scratch-sorted.bam", param = ScanBamParam(what = "pos",
>  which = which))
>
> I have access to chromosome length information, so it's not really a
> problem for me, but it seems as if something is happening which you
> didn't expect, so I thought you'd like to know.
>
> Thanks,
> -steve
>
> ps: I'm using IRanges(.., end=.Machine$integer.max-1) because using
> .Machine$integer.max causes an integer overflow
>
> --
> Steve Lianoglou
> Graduate Student: Computational Systems Biology
>  | Memorial Sloan-Kettering Cancer Center
>  | Weill Medical College of Cornell University
> Contact Info: http://cbio.mskcc.org/~lianos/contact
>



-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact



More information about the Bioc-sig-sequencing mailing list