[Bioc-sig-seq] weighted coverage on GRanges object

Hervé Pagès hpages at fhcrc.org
Wed May 18 02:14:42 CEST 2011


Hi Janet,

On 11-05-17 11:40 AM, Janet Young wrote:
> Hi,
>
> Yes, it would be great for us to be able to simply pass in weights as a vector rather than the list (it seems very instinctive to me to put the scores in an elementMetadata column, so that's mostly where I'd like to take weights from).

Ok now this works in GenomicRanges 1.4.4:

   > library(GenomicRanges)

   > dat2 <- GRanges(seqnames=rep(c("chr2L","chr3L"), c(4,3)),
                     IRanges(start=c(  (1:4)*5 , (1:3)*5),width=20),
                     score=c(3,5,10,20,6,10,30))

   > coverage(dat2, weight=elementMetadata(dat2)$score)
   SimpleRleList of length 2
   $chr2L
   'integer' Rle of length 39 with 8 runs
     Lengths:  4  5  5  5  5  5  5  5
     Values :  0  3  8 18 38 35 30 20

   $chr3L
   'integer' Rle of length 34 with 6 runs
     Lengths:  4  5  5 10  5  5
     Values :  0  6 16 46 40 30

The old behaviour is preserved i.e. 'weight' can still be a list.
The 'shift' argument is handled the same way as 'weight' i.e. it
can also be a numeric vector in addition to a list. (The code that
checks and preprocesses 'shift' and 'weight' is the same.)

Details in ?GRanges

GenomicRanges 1.4.4 should propagate to the public repo tomorrow
morning.

Cheers,
H.

 > sessionInfo()
R version 2.13.0 (2011-04-13)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
  [1] LC_CTYPE=en_CA.UTF-8       LC_NUMERIC=C
  [3] LC_TIME=en_CA.UTF-8        LC_COLLATE=en_CA.UTF-8
  [5] LC_MONETARY=C              LC_MESSAGES=en_CA.UTF-8
  [7] LC_PAPER=en_CA.UTF-8       LC_NAME=C
  [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] GenomicRanges_1.4.4 IRanges_1.10.4

>
> thanks,
>
> Janet
>
>
> On May 13, 2011, at 10:41 AM, Hervé Pagès wrote:
>
>> Hi Janet,
>>
>> Not sure it makes sense that the 'weight' arg for the coverage,GRanges
>> method is required to be a list to start with. Generally speaking, one
>> would like to be able to associate a weight to each range, and this can
>> simply be achieved by passing a numeric vector of length the number of
>> ranges. If the user wants to define the weights on a per-chromosome
>> basis, then this would better be done before, when preparing the weight
>> vector, and it's easy.
>>
>> Anyway, I'll keep support for list but will also add support for numeric
>> vector.
>>
>> Thanks for your feedback!
>> H.
>>
>>
>> On 11-05-06 12:48 PM, Janet Young wrote:
>>> Hi there,
>>>
>>> I have a suggestion - is it possible to make it a little easier to directly calculate weighted coverage on a GRanges object that has regions on>1 chromosome?   I think the code below explains what I mean.
>>>
>>> I've figured out a workaround, but it took me quite a while to get there - perhaps putting the coercion in the underlying coverage/weights code will help others avoid that in future?
>>>
>>> thanks very much,
>>>
>>> Janet
>>>
>>> -------------------------------------------------------------------
>>>
>>> Dr. Janet Young (Trask lab)
>>>
>>> Fred Hutchinson Cancer Research Center
>>> 1100 Fairview Avenue N., C3-168,
>>> P.O. Box 19024, Seattle, WA 98109-1024, USA.
>>>
>>> tel: (206) 667 1471 fax: (206) 667 6524
>>> email: jayoung  ...at...  fhcrc.org
>>>
>>> http://www.fhcrc.org/labs/trask/
>>>
>>> -------------------------------------------------------------------
>>>
>>>
>>> library(GenomicRanges)
>>>
>>>
>>> dat1<- GRanges(seqnames="chr2L",IRanges(start=((1:4)*5),width=20),score=c(3,5,10,20) )
>>>
>>> dat2<- GRanges(seqnames=rep(c("chr2L","chr3L"),c(4,3)),IRanges(start=c(  (1:4)*5 , (1:3)*5 ),width=20),score=c(3,5,10,20,6,10,30) )
>>> seqlengths(dat2)<- c(50,40)
>>>
>>> #### this works
>>> coverage(dat1,weight=list(elementMetadata(dat1)$score))
>>>
>>> ## SimpleRleList of length 1
>>> ## $chr2L
>>> ## 'integer' Rle of length 39 with 8 runs
>>> ##   Lengths:  4  5  5  5  5  5  5  5
>>> ##   Values :  0  3  8 18 38 35 30 20
>>>
>>>
>>> #### this works
>>> myscores<- list(c(3,5,10,20), c(6,10,30))
>>> coverage(dat2,weight=myscores)
>>>
>>> ## SimpleRleList of length 2
>>> ## $chr2L
>>> ## 'integer' Rle of length 50 with 9 runs
>>> ##   Lengths:  4  5  5  5  5  5  5  5 11
>>> ##   Values :  0  3  8 18 38 35 30 20  0
>>>
>>> ## $chr3L
>>> ## 'integer' Rle of length 40 with 7 runs
>>> ##   Lengths:  4  5  5 10  5  5  6
>>> ##   Values :  0  6 16 46 40 30  0
>>>
>>> ##### but it'd be great if this could work too....
>>> coverage(dat2,weight=list(elementMetadata(dat2)$score))
>>> ## Error in normargWeight(weight, nseq) : 'weight' is longer than 'x'
>>>
>>> ### for now I'll coerce my scores into a list by chromosome: does this seem to be a reliable way to do it?  this is the part it took me a while to figure out on my own...
>>> myscores<- split( elementMetadata(dat2)$score , as.character(seqnames(dat2)) )
>>> coverage(dat2,weight=myscores)
>>>
>>> sessionInfo()
>>>
>>> R version 2.13.0 (2011-04-13)
>>> Platform: i386-apple-darwin9.8.0/i386 (32-bit)
>>>
>>> locale:
>>> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
>>>
>>> attached base packages:
>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>>
>>> other attached packages:
>>> [1] GenomicRanges_1.4.3 IRanges_1.10.0
>>>
>>> _______________________________________________
>>> Bioc-sig-sequencing mailing list
>>> Bioc-sig-sequencing at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>>
>>
>> --
>> Hervé Pagès
>>
>> Program in Computational Biology
>> Division of Public Health Sciences
>> Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N, M1-B514
>> P.O. Box 19024
>> Seattle, WA 98109-1024
>>
>> E-mail: hpages at fhcrc.org
>> Phone:  (206) 667-5791
>> Fax:    (206) 667-1319
>


-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioc-sig-sequencing mailing list