[BioC] GenomicRanges::reduce feature request
Steve Lianoglou
lianoglou.steve at gene.com
Wed Aug 7 20:36:54 CEST 2013
Hi,
On Wed, Aug 7, 2013 at 11:29 AM, Zhu, Lihua (Julie)
<Julie.Zhu at umassmed.edu> wrote:
> Hi,
>
> The reduce function is very useful for joining neighboring ranges. However, the score information is lost after applying reduce. Is it possible to retain the score information after applying reduce?
>
> Here is an example.
>
> library(GenomicRanges)
>
> rd <- RangedData(
> RangesList(
> chrA=IRanges(start=c(1, 4, 6), width=c(3, 2, 4)),
> chrB=IRanges(start=c(1, 3, 6), width=c(3, 3, 4))),
> score=c(2, 7, 3, 1, 1, 1))
> rd
> RangedData with 6 rows and 1 value column across 2 spaces
> space ranges | score
> <factor> <IRanges> | <numeric>
> 1 chrA [1, 3] | 2
> 2 chrA [4, 5] | 7
> 3 chrA [6, 9] | 3
> 4 chrB [1, 3] | 1
> 5 chrB [3, 5] | 1
> 6 chrB [6, 9] | 1
>
> reduce(rd, min.gap=1)
> RangedData with 2 rows and 0 value columns across 2 spaces
> space ranges |
> <factor> <IRanges> |
> 1 chrA [1, 9] |
> 2 chrB [1, 9] |
>
> Please note that score column is missing after applying reduce. The following is with score information.
> space ranges | score
> <factor> <IRanges> | <numeric>
> 1 chrA [1, 9] | 12
> 2 chrB [1, 9] | 3
I believe similar topics like this have come up before, and the
problem is that I don't think there's any general rule of thumb that
can apply to merging all `mcols` from merged/reduced ranges.
I guess the rule you would like to apply here is to sum the score(s)
from all the combined ranges -- but why sum? One might want to average
... or take a weighted average based on length of the combined ranges,
or geometric mean, or ...
What would is the right thing to do here if the ranges being merged
had categorical `mcols` data?
-steve
--
Steve Lianoglou
Computational Biologist
Bioinformatics and Computational Biology
Genentech
More information about the Bioconductor
mailing list