[BioC] Summing Views on coverage by base

Martin Morgan mtmorgan at fhcrc.org
Tue Mar 20 22:05:04 CET 2012


On 03/20/2012 01:40 PM, Hervé Pagès wrote:
> Hi Sean,
>
> On 03/20/2012 01:14 PM, Sean Davis wrote:
>> I have a set of Views of equal width (think upstream of tss) and want
>> to sum each base across those views. I can extract each view as an
>> integer vector and create a matrix, but this matrix can get pretty
>> large. I'm missing the skills with SimpleRleViewsList, though, to
>> work directly on at object. Any suggestions?
>
>  > subject <- Rle(rep(c(0L, 1L, 3L, 2L, 18L, 0L), c(3,2,1,5,2,4)))
>  > myViews <- Views(subject, start=4:11, width=5)
>  > myViews
> Views on a 17-length Rle subject
>
> views:
> start end width
> [1] 4 8 5 [1 1 3 2 2]
> [2] 5 9 5 [1 3 2 2 2]
> [3] 6 10 5 [3 2 2 2 2]
> [4] 7 11 5 [2 2 2 2 2]
> [5] 8 12 5 [ 2 2 2 2 18]
> [6] 9 13 5 [ 2 2 2 18 18]
> [7] 10 14 5 [ 2 2 18 18 0]
> [8] 11 15 5 [ 2 18 18 0 0]
>
> This maybe would be fast enough if you don't have too many columns:
>
> viewColSums <- function(x)
> {
> sapply(seq_len(width(x)[1L]),
> function(i)
> sum(subject[start(x)+i-1L]))
> }
>
>  > viewColSums(myViews)
> [1] 15 32 49 46 44


Reduce("+", myViews)

>
> Then if your SimpleRleViewsList object is not too long (1 elt per
> chromosome?), you can sapply( , viewColSums) on it.
>
> Maybe we should make viewColSums the "colSums" method for RleViews
> objects? (and eventually implement it in C?)
>
> Cheers,
> H.
>
>>
>> Thanks,
>> Sean
>>
>>> sessionInfo()
>> R Under development (unstable) (2012-01-19 r58141)
>> Platform: i386-apple-darwin9.8.0/i386 (32-bit)
>>
>> locale:
>> [1] C
>>
>> attached base packages:
>> [1] stats graphics grDevices utils datasets methods base
>>
>> other attached packages:
>> [1] GenomicRanges_1.7.30 IRanges_1.13.28 BiocGenerics_0.1.12
>>
>> loaded via a namespace (and not attached):
>> [1] stats4_2.15.0 tools_2.15.0
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>


-- 
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109

Location: M1-B861
Telephone: 206 667-2793



More information about the Bioconductor mailing list