[Bioc-sig-seq] Add ability for `subset`ing IRanges-like objects based on their elementMetadata?

Patrick Aboyoun paboyoun at fhcrc.org
Sat Jun 5 08:51:06 CEST 2010


Great thread on the subset function. It currently has to IRanges-based 
methods:

 > showMethods("subset")
Function: subset (package base)
x="ANY"
x="DataTable"
x="Sequence"

Based on what was being discussed, I see two enhancement requests:

1) Expanding the scope of subset to allow reference to components of 
non-DataTable objects such as IRanges and GRanges instances:

## Currently not supported, but could be
ir <- IRanges(start = 1:10, end = 1:10)
subset(ir, start < 5)

2) Add support for subsetting by 'logical' Rle in the subset function.

The second request is straight-forward to implement since it can be done 
within the subset methods of the Sequence and DataTable virtual classes. 
If we limit the first to Ranges (virtual class) and GRanges (which 
doesn't inherit from Ranges) objects, then two more subset methods would 
suffice to achieve 1). Sound reasonable?


Patrick


On 6/4/10 10:06 PM, Steve Lianoglou wrote:
> Hi Vincent,
>
>    
>> the simplification that Steve
>> seems to be asking for would
>> allow implicit references to elementMetadata variables in the predicate.  I
>> am not in favor of such
>> an extension of semantics of bracket.
>>      
> Just to be clear, I'm not suggesting referencing elementMetadata
> variables implicitly w/in brackets, but rather only when using
> `subset` (as `subset` does now with columns of a data.frame (when it's
> used *on* a data.frame))
>
> So, using your example gr object:
>
> GRanges with 10 ranges and 2 elementMetadata values
>   seqnames    ranges strand |     score        GC
>      <Rle>  <IRanges>   <Rle>  |<integer>  <numeric>
> a   Chrom1  [ 1, 10]      - |         1 1.0000000
> b   Chrom2  [ 2, 10]      + |         2 0.8888889
> c   Chrom2  [ 3, 10]      + |         3 0.7777778
> d   Chrom2  [ 4, 10]      * |         4 0.6666667
> e   Chrom1  [ 5, 10]      * |         5 0.5555556
> f   Chrom1  [ 6, 10]      + |         6 0.4444444
> g   Chrom3  [ 7, 10]      + |         7 0.3333333
> h   Chrom3  [ 8, 10]      + |         8 0.2222222
> i   Chrom3  [ 9, 10]      - |         9 0.1111111
> j   Chrom3  [10, 10]      - |        10 0.0000000
>
> seqlengths
>   Chrom1 Chrom2 Chrom3
>      NA     NA     NA
>
> I was curious if this would be useful:
>
> R>  subset(gr, strand == "+"&  score>  6)
>
> but I wasn't trying to propose having something like this:
>
> R>  gr[strand == "+"&  score>  6]
>
>



More information about the Bioc-sig-sequencing mailing list