[BioC] GRanges - reduce() function
Jason Ross
jason.ross at csiro.au
Wed Nov 30 02:44:32 CET 2011
Martin Morgan <mtmorgan at ...> writes:
>
> On 11/17/2011 05:57 PM, Jason Ross wrote:
> > Hi Fahim,
> >
> > I am also frustrated by this. The meta-data also vanishes when using
> > findOverlaps(). I'm thinking of writing some wrapper functions to place the
> > meta-data back into the Granges object.
>
> Hi Jason et al.,
>
> The problem in 'reduce' is that the elementMetadata columns need to be
> 'reduce'd too, and there is no universal way to do that -- for
> 'transcripts' in Fahim's example, maybe it's just collapsing entries
> into a CharacterList, whereas for "Gene" it's split-by-reduced-range and
> 'unique'. For numeric values one might sum or mean or max or ....
>
> Can you be more specific about findOverlaps? It's not really clear which
> data you'd like to have propagated.
>
> For Fahim's question, I arrived at
>
> values(r)[["Gene"]] <-
> tapply(values(gr)[["Gene"]], match(gr, r), unique)
>
> which I think is quite robust, but I'd recommend checking carefully on
> complicated data.
>
> Martin
>
Hi Martin,
I tend to use GenomicRanges objects a lot for annotating features so I want R
merge or SQL join like functionality. I was joining data to annotations using
mySQL but found the indices broke with range joins. I considered BEDtools but
didn't like the constraints of only using BED/GFF and the shell. I switched to
using GenomicRanges and findOverlaps as I liked the very efficient interval tree
approach. I usually wrap the output of findOverlaps into a function emulating a
left or inner join from two data frames. This process is handled natively, but
rather inelegantly in BEDtools. GRanges is more powerful but doesn't offer
boolean switches on union/intersect, etc or wrapper functions that keep the
metadata.
I appreciate that their is no universal way to disentangle metadata when
aggregating but it would be nice to have some of the options available in the
union/intersection/reduce functions, or in wrapper functions. At the moment I
roll my own.
Regardless, I find GenomicRanges, etc to be very useful and powerful and it's my
preferred strategy in dealing with genomic data.
Cheers,
Jason.
At first I created GRanges objects from the dataframes
More information about the Bioconductor
mailing list