[Bioc-sig-seq] RangedData objects. Redefining widths with conditions.

Ivan Gregoretti ivangreg at gmail.com
Fri Apr 23 20:28:53 CEST 2010


Hi Michael,

With the GRanges object, resizing becomes a breeze. Thank you.

For the purpose of leaving this operation documented, I will
copy/paste my minimalist code:


library(rtracklayer) # needed by import()
library(BSgenome.Mmusculus.UCSC.mm9) # needed for chromosome lengths

# load the features
A <- import('hundredmilliontags.bed.gz', 'bed')

# coerce to GRanges
A <- as(A, 'GRanges')

# Be elegant, supply chromosome lengths
seqlengths(A) <- sapply(names(seqlengths(A)),
function(x){length(Mmusculus[[x]])})

# voila, proper resizing
resize(A, width=200)


Ivan

Ivan Gregoretti, PhD
National Institute of Diabetes and Digestive and Kidney Diseases
National Institutes of Health


On Fri, Apr 23, 2010 at 11:08 AM, Michael Lawrence
<lawrence.michael at gene.com> wrote:
>
>
> On Fri, Apr 23, 2010 at 7:42 AM, Ivan Gregoretti <ivangreg at gmail.com> wrote:
>>
>> Hi Steve,
>>
>> What you showed worked. No question, but I found resize() to be
>> unprepared to convenient use in RangedData objects.
>>
>> For example, consider a more biological set of data
>>
>> Z <- RangedData(
>>       RangesList(
>>          chrA = IRanges(start = c(1, 4, 6), width=c(3, 2, 4)),
>>          chrB = IRanges(start = c(1, 3, 6), width=c(3, 3, 4))),
>>       score = c( 2, 7, 3, 1, 1, 1 ),
>>       strand= c('+','+','-','+','-','-') )
>>
>> > Z
>> RangedData with 6 rows and 2 value columns across 2 spaces
>>        space    ranges |     score      strand
>>  <character> <IRanges> | <numeric> <character>
>> 1        chrA    [1, 3] |         2           +
>> 2        chrA    [4, 5] |         7           +
>> 3        chrA    [6, 9] |         3           -
>> 4        chrB    [1, 3] |         1           +
>> 5        chrB    [3, 5] |         1           -
>> 6        chrB    [6, 9] |         1           -
>>
>> here is resize() inconvenience
>>
>> resize(Z, width=200, fix=ifelse(Z$strand=='+','start','end'))
>> Error in function (classes, fdef, mtable)  :
>>  unable to find an inherited method for function "resize", for
>> signature "RangedData"
>>
>> What does work is ranges(Z) rather than Z itself:
>> > resize(ranges(Z), width=200, fix=ifelse(Z$strand=='+','start','end'))
>> SimpleRangesList of length 2
>> $chrA
>> IRanges of length 3
>>    start end width
>> [1]     1 200   200
>> [2]     4 203   200
>> [3]  -190   9   200
>>
>> $chrB
>> IRanges of length 3
>>    start end width
>> [1]     1 200   200
>> [2]     3 202   200
>> [3]  -190   9   200
>>
>> but as you see, the RangedData object is lost. You have to coerce it:
>>
>> > as(resize(ranges(Z), width=200,
>> > fix=ifelse(Z$strand=='+','start','end')), 'RangedData')
>> RangedData with 6 rows and 0 value columns across 2 spaces
>>        space      ranges |
>>  <character>   <IRanges> |
>> 1        chrA [   1, 200] |
>> 2        chrA [   4, 203] |
>> 3        chrA [-190,   9] |
>> 4        chrB [   1, 200] |
>> 5        chrB [   3, 202] |
>> 6        chrB [-190,   9] |
>>
>> Now I got a RangedData object but the value columns are still lost. I
>> have to reconstruct it.
>>
>> [warning: the following command is obnoxious]
>>
>>
>> > as(cbind(as.data.frame(as(resize(ranges(Z), width=200,
>> > fix=ifelse(Z$strand=='+','start','end')), 'RangedData')),
>> > as.data.frame(Z)[,5:dim(Z)[1]]), 'RangedData')
>> RangedData with 6 rows and 2 value columns across 2 spaces
>>        space      ranges |     score   strand
>>  <character>   <IRanges> | <numeric> <factor>
>> 1        chrA [   1, 200] |         2        +
>> 2        chrA [   4, 203] |         7        +
>> 3        chrA [-190,   9] |         3        -
>> 4        chrB [   1, 200] |         1        +
>> 5        chrB [   3, 202] |         1        -
>> 6        chrB [-190,   9] |         1        -
>>
>> Granted. It works, but wouldn't it be more convenient this?:
>>
>> resize(Z, width=200, fix=ifelse(Z$strand=='+','start','end'))
>>
>> Z is a tiny toy example, biological sets are regularly multi-million
>> rows. My set is over 100 million rows; as I write this, my 144GB RAM
>> machine is doing the resizing the 'long way round', as obnoxiously
>> shown . Still working.........
>>
>> I wonder if there is a 'cheaper' way resize a large RangedData
>> instance. A better solution would be to upgrade resize() but I am not
>> that R-skilled. I hope the developers will consider it.
>>
>
> This would be a simple addition, but there is the bigger question of whether
> RangedData should implement the Ranges API. It's really more of a "dataset
> with ranges" than "ranges with data". RangedData *does* implement the
> findOverlaps family of functions since they are used so commonly. There are
> also "short cuts" to the starts, ends and widths.
>
> You might find GRanges more convenient for your use-case. resize,GRanges
> automatically considers the strand in the expected way.
>
> Also, there is a short-cut like:
>
> resizedRanges <- resize(ranges(Z), width=200, fix=ifelse(Z$strand=='+',
> start','end'))
> ranges(Z) <- resizedRanges
>
> Michael
>
>>
>> Thank you,
>>
>> Ivan
>>
>> Ivan Gregoretti, PhD
>> National Institute of Diabetes and Digestive and Kidney Diseases
>> National Institutes of Health
>>
>>
>>
>> On Thu, Apr 22, 2010 at 5:11 PM, Steve Lianoglou
>> <mailinglist.honeypot at gmail.com> wrote:
>> > Hi,
>> >
>> > On Thu, Apr 22, 2010 at 4:17 PM, Ivan Gregoretti <ivangreg at gmail.com>
>> > wrote:
>> >> Hello everybody,
>> >>
>> >> How do you resize() the ranges of a RangedData object?
>> >>
>> >>
>> >> In the past (IRanges 1.4.11), I could
>> >>
>> >> 1) extend forward 200 bases from the start in '+' ranges OR
>> >> 2) extend backward 200 bases from the end in '-' ranges.
>> >>
>> >> The syntax was something like this:
>> >>
>> >> resize(ranges(A), width = 200, start = A$strand == "+")
>> >>
>> >> In IRanges 1.5.70, the "start" argument of resize() has been
>> >> depracated and replaced by "fix".
>> >>
>> >> Can somebody show how to get the task accomplished with the new
>> >> resize()?
>> >
>> > I'm pretty sure you use `fix` just like you use start:
>> >
>> > R> strands <- c("+", '-', '+', '-', '-')
>> > R> ir <- IRanges(c(1,10,20,30, 40), width=5)
>> > R> ir
>> > IRanges of length 5
>> >    start end width
>> > [1]     1   5     5
>> > [2]    10  14     5
>> > [3]    20  24     5
>> > [4]    30  34     5
>> > [5]    40  44     5
>> >
>> > R> resize(ir, width=8, fix=ifelse(strands == '+', 'start', 'end'))
>> > IRanges of length 5
>> >    start end width
>> > [1]     1   8     8
>> > [2]     7  14     8
>> > [3]    20  27     8
>> > [4]    27  34     8
>> > [5]    37  44     8
>> >
>> > --
>> > Steve Lianoglou
>> > Graduate Student: Computational Systems Biology
>> >  | Memorial Sloan-Kettering Cancer Center
>> >  | Weill Medical College of Cornell University
>> > Contact Info: http://cbio.mskcc.org/~lianos/contact
>> >
>>
>> _______________________________________________
>> Bioc-sig-sequencing mailing list
>> Bioc-sig-sequencing at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>
>



More information about the Bioc-sig-sequencing mailing list