[Bioc-sig-seq] RangedData objects. Redefining widths with conditions.

Ivan Gregoretti ivangreg at gmail.com
Fri Apr 23 16:42:49 CEST 2010


Hi Steve,

What you showed worked. No question, but I found resize() to be
unprepared to convenient use in RangedData objects.

For example, consider a more biological set of data

Z <- RangedData(
       RangesList(
          chrA = IRanges(start = c(1, 4, 6), width=c(3, 2, 4)),
          chrB = IRanges(start = c(1, 3, 6), width=c(3, 3, 4))),
       score = c( 2, 7, 3, 1, 1, 1 ),
       strand= c('+','+','-','+','-','-') )

> Z
RangedData with 6 rows and 2 value columns across 2 spaces
        space    ranges |     score      strand
  <character> <IRanges> | <numeric> <character>
1        chrA    [1, 3] |         2           +
2        chrA    [4, 5] |         7           +
3        chrA    [6, 9] |         3           -
4        chrB    [1, 3] |         1           +
5        chrB    [3, 5] |         1           -
6        chrB    [6, 9] |         1           -

here is resize() inconvenience

resize(Z, width=200, fix=ifelse(Z$strand=='+','start','end'))
Error in function (classes, fdef, mtable)  :
  unable to find an inherited method for function "resize", for
signature "RangedData"

What does work is ranges(Z) rather than Z itself:
> resize(ranges(Z), width=200, fix=ifelse(Z$strand=='+','start','end'))
SimpleRangesList of length 2
$chrA
IRanges of length 3
    start end width
[1]     1 200   200
[2]     4 203   200
[3]  -190   9   200

$chrB
IRanges of length 3
    start end width
[1]     1 200   200
[2]     3 202   200
[3]  -190   9   200

but as you see, the RangedData object is lost. You have to coerce it:

> as(resize(ranges(Z), width=200, fix=ifelse(Z$strand=='+','start','end')), 'RangedData')
RangedData with 6 rows and 0 value columns across 2 spaces
        space      ranges |
  <character>   <IRanges> |
1        chrA [   1, 200] |
2        chrA [   4, 203] |
3        chrA [-190,   9] |
4        chrB [   1, 200] |
5        chrB [   3, 202] |
6        chrB [-190,   9] |

Now I got a RangedData object but the value columns are still lost. I
have to reconstruct it.

[warning: the following command is obnoxious]


> as(cbind(as.data.frame(as(resize(ranges(Z), width=200, fix=ifelse(Z$strand=='+','start','end')), 'RangedData')), as.data.frame(Z)[,5:dim(Z)[1]]), 'RangedData')
RangedData with 6 rows and 2 value columns across 2 spaces
        space      ranges |     score   strand
  <character>   <IRanges> | <numeric> <factor>
1        chrA [   1, 200] |         2        +
2        chrA [   4, 203] |         7        +
3        chrA [-190,   9] |         3        -
4        chrB [   1, 200] |         1        +
5        chrB [   3, 202] |         1        -
6        chrB [-190,   9] |         1        -

Granted. It works, but wouldn't it be more convenient this?:

resize(Z, width=200, fix=ifelse(Z$strand=='+','start','end'))

Z is a tiny toy example, biological sets are regularly multi-million
rows. My set is over 100 million rows; as I write this, my 144GB RAM
machine is doing the resizing the 'long way round', as obnoxiously
shown . Still working.........

I wonder if there is a 'cheaper' way resize a large RangedData
instance. A better solution would be to upgrade resize() but I am not
that R-skilled. I hope the developers will consider it.

Thank you,

Ivan

Ivan Gregoretti, PhD
National Institute of Diabetes and Digestive and Kidney Diseases
National Institutes of Health



On Thu, Apr 22, 2010 at 5:11 PM, Steve Lianoglou
<mailinglist.honeypot at gmail.com> wrote:
> Hi,
>
> On Thu, Apr 22, 2010 at 4:17 PM, Ivan Gregoretti <ivangreg at gmail.com> wrote:
>> Hello everybody,
>>
>> How do you resize() the ranges of a RangedData object?
>>
>>
>> In the past (IRanges 1.4.11), I could
>>
>> 1) extend forward 200 bases from the start in '+' ranges OR
>> 2) extend backward 200 bases from the end in '-' ranges.
>>
>> The syntax was something like this:
>>
>> resize(ranges(A), width = 200, start = A$strand == "+")
>>
>> In IRanges 1.5.70, the "start" argument of resize() has been
>> depracated and replaced by "fix".
>>
>> Can somebody show how to get the task accomplished with the new resize()?
>
> I'm pretty sure you use `fix` just like you use start:
>
> R> strands <- c("+", '-', '+', '-', '-')
> R> ir <- IRanges(c(1,10,20,30, 40), width=5)
> R> ir
> IRanges of length 5
>    start end width
> [1]     1   5     5
> [2]    10  14     5
> [3]    20  24     5
> [4]    30  34     5
> [5]    40  44     5
>
> R> resize(ir, width=8, fix=ifelse(strands == '+', 'start', 'end'))
> IRanges of length 5
>    start end width
> [1]     1   8     8
> [2]     7  14     8
> [3]    20  27     8
> [4]    27  34     8
> [5]    37  44     8
>
> --
> Steve Lianoglou
> Graduate Student: Computational Systems Biology
>  | Memorial Sloan-Kettering Cancer Center
>  | Weill Medical College of Cornell University
> Contact Info: http://cbio.mskcc.org/~lianos/contact
>



More information about the Bioc-sig-sequencing mailing list