[Bioc-sig-seq] Cached GenomicRanges or RangedData Objects?

Charles C. Berry cberry at tajo.ucsd.edu
Mon Oct 11 17:59:17 CEST 2010


On Mon, 11 Oct 2010, Steve Lianoglou wrote:

> Hi Chuck,
>
> On Mon, Oct 11, 2010 at 11:24 AM, Charles C. Berry <cberry at tajo.ucsd.edu> wrote:
>>
>> We are liking the idioms that go with GenomicRanges and RangedData Objects
>> (follow, precede, findOverlaps, etc), but we are bumping up against memory
>> demands of loading very large objects.
>>
>> Is there now or will there soon be a cached version of these that will
>> lessen our memory requirements?
>>
>> If not, is there a cookbook as to how to create and save cached versions of
>> these objects.
>>
>> Or maybe a place to look in the bioConductor codebase to get some ideas of
>> how to go about constructing cached versions of these classes?
>
> I'm not sure what you mean by caching -- do you want them serialized
> to disk and you read off parts when you need them, or?

That's basically the idea. I looked at how BSGenome handles FASTA, and it 
allows you to read in one chromosome, make apparent copies that do not 
physically copy the object unless it is modified, and then clean up 
afterwards without much of the work under the hood.


>
> Also -- I typically split my data and processing to work on a
> chromosome by chromosome basis -- even though the GenomicRanges
> infrastructure allows you to keep ranges spanning multiple chromosomes
> in one object. Although it's a bit more book keeping code on my part,
> I find that doing so helps to keep my RAM requirements down a bit.
> Perhaps that obvious/marginal suggestion might help for the time
> being?

Thanks. We have bits and pieces of a pipeline that do that. But we are 
about to refactor that pipeline, so the hope is to make something that is 
fairly clean, will endure, and handle the large objects that new 
sequencing technologies are likely to throw at us.

Chuck
>
> -steve
>
> -- 
> Steve Lianoglou
> Graduate Student: Computational Systems Biology
>  | Memorial Sloan-Kettering Cancer Center
>  | Weill Medical College of Cornell University
> Contact Info: http://cbio.mskcc.org/~lianos/contact
>

Charles C. Berry                            (858) 534-2098
                                             Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu	            UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901



More information about the Bioc-sig-sequencing mailing list