[Bioc-sig-seq] rtracklayer and import()ing into GRanges

Ivan Gregoretti ivangreg at gmail.com
Thu Aug 5 15:31:22 CEST 2010


Hi Patrick, Michael and all,

I would be more than happy to test-drive the new function when you are ready.

Thank you,

Ivan





On Thu, Aug 5, 2010 at 3:51 AM, Patrick Aboyoun <paboyoun at fhcrc.org> wrote:
> Michael,
> Breaking this down to two issues:
>
> Filtering
> Martin has been working on improving filtering in the ShortRead package to
> move from a read all then filter data to a block processing based filtering
> methodology. Lessons learned there can be brought to rtracklayer for large
> bed files and the like.
>
> import() output class
> Keeping the same API and just switching the import methods from producing
> RangedData (or UCSCData) output to GRanges output will break backward
> compatibility because the RangedData API is not wholly applicable to GRanges
> objects. I would not recommend this course since a number of packages in
> BioC and scripts in the wild expect the import methods to produce a
> RangedData (or UCSCData) object. An additional argument is not that onerous
> and can be fazed out over the course of two or three releases (1 - 1.5
> years). Another alternative is to add a new import function (read.GRanges?)
> to rtracklayer that shares the same infrastructure as the existing import
> methods.
>
> I have a local copy of rtracklayer where I added a new asRangedData flag to
> the GenomicData function and import.gff* methods. I'll sit on this for now
> since these changes didn't take a lot of work. This is one of the situations
> where the managing the life cycle of the function specs is trickier than
> making the desired code changes.
>
>
> Cheers,
> Patrick
>
>
> On 8/4/10 8:24 PM, Michael Lawrence wrote:
>
> This might work, but it seems like an expensive optimization in that it
> changes a lot of the API. If someone cannot make a single copy of the data,
> it's unlikely that they're even going to be able to get to GenomicData() or
> manipulate it later. Perhaps the coercion function needs some simple tweaks?
> The filter support would definitely help. I'd rather keep things simple and
> return a single type, and GRanges sounds most appropriate.
>
> But I'm open to suggestions and further argument.
>
> Michael
>
> On Wed, Aug 4, 2010 at 2:05 PM, Patrick Aboyoun <paboyoun at fhcrc.org> wrote:
>>
>> Michael,
>> How integrated would you like to see the GRanges class in rtracklayer? The
>> rtracklayer::GenomicData constructor is the master instantiator. I would
>> like to add an asRangedData = TRUE (default) argument to the GenomicData
>> function and push it all the way up through the import functions where when
>> the user sets asRangedData = FALSE, the GenomicData function would create a
>> GRanges object. This is what we did with the
>> {matchPWM,vmatchPattern,vmatchPDict},BSgenome-methods in the BSgenome
>> package and it as good a solution as any. This is a straight-forward change
>> and wouldn't take too long to complete.
>>
>>
>> Patrick
>>
>>
>> On 8/4/10 12:56 PM, Michael Lawrence wrote:
>>>
>>> GRanges support is definitely on the TODO list. Filters are a good idea
>>> and
>>> also on the TODO list, possibly with a chunk size parameter to enable
>>> chunk
>>> processing.
>>>
>>> I'd love to have the GRanges stuff at least done by the next release.
>>> Patches welcome, of course :)
>>>
>>> Michael
>>>
>>> On Wed, Aug 4, 2010 at 8:08 AM, Ivan Gregoretti<ivangreg at gmail.com>
>>>  wrote:
>>>
>>>
>>>>
>>>> Hello Michael and everyone,
>>>>
>>>> Would you please consider adding to import() the capacity to generate
>>>> a GRanges object rather than the default RangedData object?
>>>>
>>>> Also,
>>>>
>>>> Wouldn't it be great to be able to import() with filters just like
>>>> with readAligned()?
>>>>
>>>>
>>>>
>>>> Justification
>>>>
>>>> GRanges is a biology-aware container. When importing large BEDs into
>>>> R, the current workflow involves creating RangedData first and then
>>>> converting to GRanges.
>>>>
>>>> If the BEDs are really big, holding both objects in memory at any
>>>> point in time is a hardware challenge.
>>>>
>>>> The capacity to filter the input would help in this case and in
>>>> general it would provide an increase in efficiency.
>>>>
>>>>
>>>> Thank you,
>>>>
>>>> Ivan
>>>>
>>>>
>>>>
>>>>
>>>> Ivan Gregoretti, PhD
>>>> National Institute of Diabetes and Digestive and Kidney Diseases
>>>> National Institutes of Health
>>>> 5 Memorial Dr, Building 5, Room 205.
>>>> Bethesda, MD 20892. USA.
>>>> Phone: 1-301-496-1016 and 1-301-496-1592
>>>> Fax: 1-301-496-9878
>>>>
>>>> _______________________________________________
>>>> Bioc-sig-sequencing mailing list
>>>> Bioc-sig-sequencing at r-project.org
>>>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>>>>
>>>>
>>>
>>>        [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> Bioc-sig-sequencing mailing list
>>> Bioc-sig-sequencing at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>>>
>>
>
>
>



More information about the Bioc-sig-sequencing mailing list