[Bioc-sig-seq] Add ability for `subset`ing IRanges-like objects based on their elementMetadata?

Patrick Aboyoun paboyoun at fhcrc.org
Sat Jun 5 19:30:06 CEST 2010


There is a lot of meat here that I can't properly address now because I 
am heading out to serve as a BioC evangelist in Europe. I was looking 
over the as.env methods that you created Michael and I agree it would be 
useful if we expanded upon this to support Rle's. I probably wont be 
able to do much work on this until late June, but Michael feel free to 
rework this as you see fit.


Cheers,
Patrick


On 6/5/10 9:18 AM, Charles C. Berry wrote:
> On Fri, 4 Jun 2010, Patrick Aboyoun wrote:
>
>> Great thread on the subset function. It currently has to 
>> IRanges-based methods:
>>
>>>  showMethods("subset")
>> Function: subset (package base)
>> x="ANY"
>> x="DataTable"
>> x="Sequence"
>>
>> Based on what was being discussed, I see two enhancement requests:
>>
>> 1) Expanding the scope of subset to allow reference to components of 
>> non-DataTable objects such as IRanges and GRanges instances:
>>
>> ## Currently not supported, but could be
>> ir <- IRanges(start = 1:10, end = 1:10)
>> subset(ir, start < 5)
>>
>> 2) Add support for subsetting by 'logical' Rle in the subset function.
>>
>> The second request is straight-forward to implement since it can be 
>> done within the subset methods of the Sequence and DataTable virtual 
>> classes. If we limit the first to Ranges (virtual class) and GRanges 
>> (which doesn't inherit from Ranges) objects, then two more subset 
>> methods would suffice to achieve 1). Sound reasonable?
>>
>>
>> Patrick
>
> Perhaps this request pertaining to xtabs(..., subset = ...) is related.
>
> Currently (rather, in IRanges_1.6.4)
>
>> library(IRanges)
>> ir <- 
>> RangedData(IRanges(start=1:10,width=1),space=rep(letters[1:2],5),z=rep(1:3,length=10)) 
>>
>> xtabs(~z,as.data.frame(ir),subset = z > 1)
> z
> 2 3
> 3 3
>> xtabs(~z,subset(ir,z>1))
> z
> 2 3
> 3 3
>>
>> xtabs(~z,ir,subset = z > 1)
> Error in xj[i] : invalid subscript type 'closure'
>>
>> xtabs(~z,subset(ir,space=='a'))
> z
> 1 2 3
> 2 1 2
>> xtabs(~z,ir,subset = space=='a')
> Error in xj[i] : invalid subscript type 'closure'
>>
>
> Can this be changed to allow use of the subset argument when the data 
> arg is a RangedData (or GRanges) instance?
>
> Thanks,
>
> Chuck
>
>>
>>
>> On 6/4/10 10:06 PM, Steve Lianoglou wrote:
>>>  Hi Vincent,
>>>
>>>
>>> >  the simplification that Steve
>>> >  seems to be asking for would
>>> >  allow implicit references to elementMetadata variables in the 
>>> predicate. >  I
>>> >  am not in favor of such
>>> >  an extension of semantics of bracket.
>>> >
>>>  Just to be clear, I'm not suggesting referencing elementMetadata
>>>  variables implicitly w/in brackets, but rather only when using
>>>  `subset` (as `subset` does now with columns of a data.frame (when it's
>>>  used *on* a data.frame))
>>>
>>>  So, using your example gr object:
>>>
>>>  GRanges with 10 ranges and 2 elementMetadata values
>>>    seqnames    ranges strand |     score        GC
>>> <Rle> <IRanges> <Rle>  |<integer> <numeric>
>>>  a   Chrom1  [ 1, 10]      - |         1 1.0000000
>>>  b   Chrom2  [ 2, 10]      + |         2 0.8888889
>>>  c   Chrom2  [ 3, 10]      + |         3 0.7777778
>>>  d   Chrom2  [ 4, 10]      * |         4 0.6666667
>>>  e   Chrom1  [ 5, 10]      * |         5 0.5555556
>>>  f   Chrom1  [ 6, 10]      + |         6 0.4444444
>>>  g   Chrom3  [ 7, 10]      + |         7 0.3333333
>>>  h   Chrom3  [ 8, 10]      + |         8 0.2222222
>>>  i   Chrom3  [ 9, 10]      - |         9 0.1111111
>>>  j   Chrom3  [10, 10]      - |        10 0.0000000
>>>
>>>  seqlengths
>>>    Chrom1 Chrom2 Chrom3
>>>       NA     NA     NA
>>>
>>>  I was curious if this would be useful:
>>>
>>> R>   subset(gr, strand == "+"&  score>  6)
>>>
>>>  but I wasn't trying to propose having something like this:
>>>
>>> R>   gr[strand == "+"&  score>  6]
>>>
>>>
>>
>> _______________________________________________
>> Bioc-sig-sequencing mailing list
>> Bioc-sig-sequencing at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>>
>
> Charles C. Berry                            (858) 534-2098
>                                             Dept of Family/Preventive 
> Medicine
> E mailto:cberry at tajo.ucsd.edu                UC San Diego
> http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 
> 92093-0901
>
>



More information about the Bioc-sig-sequencing mailing list